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ABSTRACT:  The  implementation  of  a  processor  generator  for  student 

languages  is  presented.  Included  are  schemes  for  the  formal 
specification  of  a  language  by  its  syntax  and  semantics,  for 
the  automatic  construction  of  a  symbol  table,  and  for  the  pro¬ 
duction  of  the  driving  arrays  for  a  table-driven  scanner, 
parser  and  code  emitter. 

In  addition,  a  pseudo-machine  designed  to  facilitate 
semantics  specifications  and  to  provide  a  high  diagnostic 
capability  is  examined. 
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PREFACE 


The  compiler  generating  system  and  companion  emulator 
described  in  this  thesis  were  created  for  the  Teaching  Languages 
Project  of  the  Computer  Systems  Research  Group  at  the  University  of 
Toronto . 

The  system  is  designed  to  serve  as  both  a  batch  processor 
for  programs  written  in  a  variety  of  student  languages,  and  a  practical 
tool  for  language  development.  In  this  latter  role,  the  system 
provides  a  previously  unavailable  facility  for  the  creation  of  a  working 
processor  for  a  proposed  language,  thus  permitting  trial  usage  of  the 
language  by  its  intended  users  and  allowing  modifications  to  be  quickly 
made . 

The  system  is  described  briefly  in  Chapter  1.  A  very  brief 
review  of  language  processors  is  presented  in  Chapter  2,  leading  to 
the  design  considerations  for  this  particular  system  in  Chapter  3.  In 
Chapter  4,  the  actual  implementation  is  discussed  in  detail,  and  in 
Chapter  5  conclusions  are  drawn  as  to  the  merits  of  this  project. 

The  system  as  described  in  the  thesis  proper  is  not  complete 
at  the  time  of  writing  this  thesis;  those  features  still  under 
development  are  tabulated  in  the  appendices.  However,  the  design  of  the 
system  described  herein  is  finalized,  and  the  processor  generator  will 
be  completed  under  the  auspices  of  the  Teaching  Languages  Project. 
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CHAPTER  1 


INTRODUCTION 


Since  the  inception  of  programmable  computers,  men  have  sought 
to  make  the  programming  of  these  machines  an  easier  task.  This  desire 
has  resulted  in  the  creation  of  assembly  languages,  and  the  so-called 
higher  level  languages  more  closely  resembling  either  natural  languages 
(as  COBOL  resembles  English)  or  the  formulae  of  mathematics.  From  this 
the  need  has  arisen  for  translation  programs  to  convert  the  source 
programs  into  machine  code. 

The  simpler  translators  are  referred  to  as  assemblers,  while 
the  more  elaborate  processors  for  the  higher  level  languages  are  called 
compilers . 

The  task  of  writing  these  translators  has  been  very  time- 
consuming  and  difficult.  It  is  natural,  then,  to  extend  the  concepts 
of  compiler  translation  to  the  coding  of  compilers  themselves,  for  the 
compilation  process  should  be  describable  in  a  compiler  language  just 
as  mathematical  operations  are  describable  in  the  so-called  mathematical 
languages . 

This  has  led  to  the  development  of  the  compiler  compilers, 
programs  which  take  as  input  the  syntactic  and  semantic  specification 
of  languages  and  which  produce  as  output  compilers  which  can  read  an 
input  program,  parse  it  according  to  the  syntactic  rules  specified. 
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and  emit  the  code  required  by  the  semantics  rules.  Unfortunately,  the 
semantics  capability  of  most  existing  compiler  compilers  is  deficient, 
or  even  non-existent. 

This  weakness  is  attributable  primarily  to  the  difficulty  in 
devising  a  sufficiently  powerful  semantics  language  to  deal  with  the 
very  broad  range  of  object  machines  for  which  code  must  be  emitted. 

If  a  particular  object  machine  for  which  a  given  compiler  compiler  must 
emit  code  is  defined,  then  a  semantics  specification  language  can  be 
devised,  using  as  the  semantics  language  the  machine  language  of  that 
object  machine.  Unfortunately,  incorporating  the  detailed  structure 
of  the  actual  machine  code  for  the  hardware  is  extremely  unwieldy, 
and  compilers  that  produce  actual  machine  code  tend  to  be  formula 
translators.  The  alternative  approach  of  using  a  pseudo-machine  code 
as  the  semantics  language,  and  then  creating  an  emulator  to  execute 
object  programs  written  in  the  pseudo-machine  language,  is  discussed 
below. 

The  ability  of  a  compiler  compiler  to  generate  language 
translators  in  an  automatic  way  has  obvious  advantages  in  an  educational 
environment,  where  several  specialized  pedagogical  languages,  as  well 
as  the  common  commercially  supported  languages,  are  taught.  Here, 
compiler  compilers  can  be  used  to  generate  processors  for  an  acceptable 
set  of  languages.  The  common  object  machine  into  whose  language  these 
source  programs  are  compiled  may  be  a  pseudo-machine,  especially 
designed  for  the  excellent  diagnostic  capability  and  high  compilation 
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rate  necessary  for  processing  student  jobs. 

The  design  and  implementation  of  a  language  processor 
generating  system  consisting  of  a  compiler  compiler  whose  output  is 
merged  with  an  emulator  for  a  pseudo-machine  is  certainly  possible  and 

i 

its  value  in  an  educational  environment  is  indeed  great. 

The  system  described  in  the  following  chapters  was  developed 
as  part  of  the  Teaching  Languages  Project  of  the  Computer  Systems 
Research  Group.  This  system,  which  is  at  the  time  of  writing  of  this 
thesis  in  a  partially  implemented  state  (as  described  later),  is 
designed  to  process  student  jobs  written  in  a  variety  of  languages  in 
a  batch  mode.  The  resident  processor  uses  in-core  compilation  and 
execution  to  eliminate  the  time-consuming  storage  and  recall  of  object 
programs.  It  supervises  all  input/output  activity  of  the  student 
jobs,  as  well  as  enforcing  time  and  output  limits,  and  is  designed  so 
as  to  minimize  the  possibility  of  the  processor  violating  either  the 
Supervisor  or  another  problem  partition  when  run  in  a  multiprogramming 
environment  on  one  of  the  University  of  Toronto  Computing  Centre's 
machines . 

The  development  of  the  system  is  presented  chronologically, 
with  a  preliminary  discussion  of  the  previous  work  in  this  field, 
followed  by  the  design  considerations  peculiar  to  this  system.  The 
actual  implementation  of  the  system  is  then  presented,  followed  by  a 
summary  of  the  work  done.  Before  a  detailed  study  is  presented,  however, 
two  simple  block  diagrams  of  the  operation  of  the  system  are  given.  The 
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first.  Fig.  1.1,  describes  the  generation  of  a  language  processor; 
second.  Fig.  1.2,  describes  the  execution  of  student  jobs  by  that 


the 


processor. 


Figure  1.1 


Generation  of  a  Language  Processor 
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CHAPTER  2 


AN  HISTORICAL  SURVEY 


2.1  FORMAL  DESCRIPTION  OF  LANGUAGES 


A  language,  be  it  a  natural  language  or  a  formal,  or  artificial, 
one  has  two  aspects:  syntax  and  semantics. 

The  syntactic  specification  of  a  context'  free  language,  being 
simply  the  specification  of  the  permissible  sequences  of  words,  or  tokens, 
in  the  language,  has  been  possible  using  Backus  Naur  form  for  several  years 
[ BA 1 ] .  Other  forms,  such  as  graphs,  have  also  been  postulated  [FE2]  but 
all  of  these  methods  of  specifying  syntax  rely  on  the  fact  that  syntax  is 
an  abstract  concept;  it  is  a  series  of  rules  for  forming  sequences  of 
tokens,  or,  conversely,  for  recognizing  sequences  of  tokens.  The  tokens 
themselves  are  simply  labels,  and  the  syntax  of  a  language  is  independent 
of  the  entities  these  labels  represent.  This  is  the  semantics  of  the 
language . 

Semantics  in  general  has  not  been  rigorously  definable,  and 
indeed  it  can  not  be  due  to  its  dependence  on  some  ultimate  reference 
that  has  not  as  yet  been  agreed  upon.  Certainly  the  adoption  of  a 
Universal  Machine  which  could  perform  any  describably  task  in  a  well- 
defined  and  unambiguous  way,  and  whose  description  was  equally  rigorous, 
would  permit  a  linguist  to  describe  semantics  in  a  frame  understood  by 
all.  A  Turing  machine  would  do  for  the  computer  scientist.  However, 
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such  a  machine  is  impractical  to  say  the  least,  and  semantics  specifications 
for  formal  languages  must  resort  to  appeals  to  global  concepts  expressed 
in  a  natural  language.  For  example,  the  Algol  68  report  gives  semantics  for 
the  syntax  <integer>  ::=  <integer>  +  <integer>  as  'the  result  obtained  "in 
the  sense  of  numerical  analysis"1.' 

An  alternative  approach  has  been  to  select  a  well-known  formal 
language  (such  as  Algol)  to  define  the  semantics  of  statements  in  a  new 
language.  This  technique  depends  on  the  fact  that  the  semantics  of  the 
base  language  are  well-defined,  and  this  is  not  always  the  case  (in  Algol, 
for  example,  the  semantic  action  for  the  addition  of  a  character  string 
and  a  number  is  dependent  on  the  compiler  used) . 

No  matter  what  approach  to  semantics  specification  is  used, 
it  must  be  remembered  that  semantics  are  well-defined  only  when  related 
to  the  actions  of  some  predictable  machine  (be  it  real  or  pseudo) ,  and 
that,  therefore,  in  specifying  semantics,  one  must  first  specify  the 
device  which  will  execute  the  semantics. 

2.2  COMPILERS 


In  the  beginning,  there  was  machine  language.  It  was  not  long 
before  computer  linguists  were  writing  assemblers  to  make  source  programs 
easier  to  code,  and  such  a  philosophy  led  naturally  to  the  creation  of 

1  Van  Wijngaarden,  A.,  Mailloux,  B.J.,  Peck,  J.E.,  and  Koster,  C.H.; 
REPORT  ON  THE  ALGORITHMIC  LANGUAGE  ALGOL  68;  The  Mathematical  Centre, 
Amsterdam,  Feb.  1969. 
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high-level  languages  and  their  compilers.  While  the  early  compilers  all 
adopted  ad  hoc  techniques  for  translating  from  a  particular  source  language 
to  a  particular  object  language,  the  formalization  of  language  syntax 
specifications  permitted  various  parsing  techniques  to  be  developed  and 
used  in  compiler  writing.  Included  here  are  such  things  as  table-driven 
parsers,  which  were  used  in  many  early  compilers  and  are  still  being  re¬ 
visited  [ FE 1 ] ,  the  stack  algorithms  of  Irons  [IR1] ,  the  transition  matrices 
of  Gries  [GR1] ,  and  recently  the  deterministic  pushdown  automata  (DPDA) 
parsers  of  DeRemer  [DR1]  . 

However,  prior  to  the  advent  of  compiler  compilers  (discussed 
below),  all  of  these  techniques  still  required  the  manual  creation  of 
parsing  tables  by  the  compiler  writer.  Further,  symbol  table  organization 
and  algorithms,  although  categorized,  were  invariably  coded  by  each  compiler 
writer,  as  were  the  scan  routines  and  code  generators  and  emitters.  Each 
major  language  spawned  a  variety  of  dialects  and  special  compilers.  There 
were  standard  FORTRAN  compilers,  fast  FORTRAN  compilers,  diagnostic  FORTRAN 
compilers,  and  optimizing  FORTRAN  compilers.  Many  seemed  reconciled  to 
this  myriad  of  languages  and  compilers,  for  to  devise  a  new  language  and  then 
to  write  its  compiler  was  beyond  the  scope  of  anyone  not  prepared  to  invest 
at  least  a  man-year  of  work  (unless  he  could  suffice  with  modifying  an 
existing  compiler). 
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2 . 3  EMULATORS 


"Divide  and  conquer"  has  always  been  a  good  strategy  in  computer 
programming.  The  philosophy  of  subdivision  and  pipeline  processing 
permits  several  people  to  work  simultaneously  on  different  sections  of  a 
large  program  and  it  permits  one  person  to  debug  different  sections  one  at 
a  time.  In  addition,  pipeline  processing  is  often  a  natural  model  for  a 
multi-stage  process  such  as  language  translation  where  an  intermediate 
language  (or  languages)  may  be  desirable.  It  was  this  consideration  that 
led  to  the  development  of  emulators  or  pseudo-machines  -  ’computers'  built 
with  software,  not  hardware,  so  that  the  translation  from  source  program 
to  object  is  buffered  by  an  intermediate  language. 

If  a  compiler  has  as  its  target  language  something  akin  to  its 
source  language,  the  process  of  compilation  is  generally  simplified. 
Consequently,  target  machines  are  often  devised  with  a  higher-level 
instruction  set  than  that  of  the  actual  hardware,  and  then  an  emulator 
program  is  written  to  execute  this  high-level  'object'  code.  The 
relative  complexity  of  compiler  and  emulator  is  quite  variable;  the 
extremes  are  a  compiler  which  outputs  actual  machine  code  so  that  the 
only  emulator  is  in  the  hardware  itself  and,  to  the  programmer,  there 
is  no  emulator;  and,  at  the  other  extreme,  an  emulator  whose  instruction 
repertoire  consists  of  statements  in  the  source  language,  so  that  there 
is  no  compiler.  The  optimal  division  between  compiler  and  emulator 
complexity  can  be  made  only  by  consideration  of  the  source  language, 
target  language,  and  type  of  jobs  and  relative  frequencies  of  compilations 
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versus  executions.  An  additional  consideration,  if  several  high-level 
languages  are  to  be  processed,  is  that  by  using  a  common,  powerful 
emulator,  the  compilers  themselves  may  be  generated  more  easily. 

Because  emulators  are  very  machine-dependent  and  are  logically 
constructed  after  a  compiler  has  been  designed,  their  creation  is  more 
of  an  art  than  a  science,  which  is  to  say  that  they  rely  more  on  ad  hoc 
techniques  than  on  a  few  universally  accepted  concepts.  This  is 
probably  related  to  the  fact  that  an  emulator  really  given  semantic 
meaning  to  a  language,  while  the  compiler  in  general  is  concerned  with 
syntax.  This  division  is  not  a  necessary  one  but  is  often  a  natural  one. 

For  reasons  that  will  be  discussed  in  Chapter  3,  compiling  to 
a  pseudo-machine  and  then  emulating  this  machine  has  several  advantages 
for  an  educational  environment.  Examples  of  the  use  of  such  emulators 
are  the  SL/I  system  for  the  IBM  1130,  the  SPL  batch  processor  developed 
at  Stanford  University  [W01] ,  and  the  PLUTO  emulator  developed  at  the 
University  of  Toronto  [PU1] .  However,  all  the  emulators  named  above 
have  one  thing  in  common:  they  are  pseudo-machines  created  to  execute 
a  particular  object  language,  which  in  turn  was  created  to  be  a  translation 
of  a  particular  source  language,  namely  a  PL/I  subset,  SPL  and  PL/I, 
respectively. 

2.4  COMPILER  GENERATING  SYSTEMS 


With  the  proliferation  of  computer  languages  in  the  late  1950’s 
and  early  1960's,  there  was  naturally  an  accompanying  array  of  compilers 
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produced.  By  that  time,  as  was  discussed  earlier,  several  standard 
techniques  of  compilation  were  known  and  the  task  of  writing  a  compiler 
became  a  straightforward  and  rather  tedious  operation.  Just  as  impatience 
on  the  part  of  programmers  and  the  ever- lowering  cost  of  hardware  (and 
consequently  computer  time)  relative  to  the  cost  of  programming  had 
stimulated  the  adoption  of  high-level  languages,  so  too  did  these  causes 
then  encourage  the  development  of  automatic  compiler  generators,  the 
compiler  compilers. 

A  compiler  compiler  offers  a  computer  user  the  opportunity  to 
devise  his  own  language  and,  by  simply  specifying  its  syntax  and  semantics, 
to  obtain  a  working  compiler  for  that  language.  The  ease  of  these 
specifications,  the  diagnostics  produced  for  any  ambiguities  or  inconsis¬ 
tencies  detected,  and  the  efficiency  and  (more  important)  the  precision 
of  the  emitted  compiler  and  the  code  it  will  produce,  are  the  criteria 
for  judging  a  compiler  compiler.  The  best  known  of  the  early  compiler 
compiler  systems  is  probably  that  written  in  Atlas  Autocode  by  Brooker, 
Morris  et  al  [BR1]. 

In  this  system,  the  syntax  of  a  language  is  given  in  a  BNF- 
like  set  of  productions  for  the  statements  of  a  phrase-structure  language 
and  the  semantics  is  given  by  a  corresponding  set  of  routines  which 
define  the  machine  language  to  be  emitted  for  a  recognized  statement. 

In  addition,  built-in  routines  to  handle  labels  and  branches,  listings 
and  other  such  housekeeping  problems  are  avoided.  This  compiler  compiler 
contains  several  concepts  common  to  succeeding  systems,  the  obvious  ones 
being  the  basic  format  of  making  syntactic  productions  the  basic  elements 
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of  the  language  and  associating  semantics  with  them,  and  the  intrinsic 
extensibility  of  the  system  to  permit  a  user  to  add  routines  if  those 
supplied  are  not  sufficient.  However,  it  is  extremely  statement  oriented, 
with  a  global  symbol  table  but  no  provision  for  program  structure  or 
lexic  levels  and  the  accompanying  concepts  of  scope  for  symbols.  Code 
emission  is  sequential  as  statements  are  recognized,  and  there  exists  no 
parse  stack  or  other  such  mechanism  for  subordinating  some  statements  to 
others,  such  as  in  DO  groups,  IF  THEN  ELSE  sequences,  and  other  now 
common  constructions.  Nevertheless,  it  is  still  of  considerable  academic 
interest . 

A  more  recent  compiler  compiler  is  the  XPL  system  created  at 
Stanford  [MK1]  which  was  employed  as  the  initial  base  for  this  project. 
Using  XPL,  the  syntax  of  a  language  specified  in  BNF  is  fed  to  a  program 
called  ANALYZER  which,  for  MSP (2,1; 1,1)  languages,  produces  a  set  of 
parsing  tables  which,  when  combined  with  a  framework  (called  SKELETON) 
written  in  XPL,  provide  an  XPL  source  program  which  compiles  to  produce  a 
recognizer  for  the  specified  language.  It  is  the  user's  responsibility 
to  code  in  XPL  and  to  add  to  this  recognizer  his  own  scanner,  symbol 
table  and  associated  routines,  code  synthesizers  and  code  emitters. 

While  the  XPL  system  offers  an  automatically  generated  recognizer  with 
the  framework  for  the  rest  of  the  compiler,  and  a  compiler  for  XPL 
source  that  produces  efficient  /360  machine  code,  it  does  stop  short 
of  being  a  proper  compiler  compiler.  It  could  be  argued  that  this 
provides  desirable  freedom  in  designing  the  rest  of  the  compiler. 
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especially  in  the  choice  of  object  code  emitted,  and  this  is  of  course 
true.  However,  the  ability  to  specify  semantics  in  a  high-level  meta¬ 
language,  or  as  a  series  of  macros,  is  a  necessity  if  the  goal  of  a 
compiler  compiler  system  is  to  be  realized:  namely,  the  automatic  gene¬ 
ration  of  a  complete  compiler,  given  only  the  specification  of  the  source 
language. 

Such  an  extension  of  the  XPL  project  has  one  possible  disadvantage 
for  a  given  compiler  compiler  the  target  or  object  language  must  be  fixed 
to  give  a  basis  on  which  to  specify  semantics.  This  restraint  is  of  no 
consequence,  however,  if  the  target  language  is  object  code  for  a  high- 
level  pseudo-machine,  and,  in  fact,  the  inclusion  of  the  pseudo-machine 
emulator  in  the  generated  compiler  source  program  yields  a  complete 
language  processor  which,  once  compiled,  will  compile  and  execute  source 
programs  written  in  the  specified  language.  Thus,  the  consequence  of 
extending  the  theories  of  compiler  compilers  is  to  conceive  of  a 
complete  language  processor  generator  which  can  produce  compatible  batch 
processors  for  a  variety  of  languages. 

This  thesis  describes  the  design  and  implementation  of  such  a 


generator. 


■ 


15 


CHAPTER  3 


DESIGN  CONSIDERATIONS 


3.1  ENVIRONMENT 

As  mentioned  in  the  introduction,  this  system  has  been 
designed  to  generate  language  processors  for  batches  of  jobs  written 
in  a  variety  of  student  languages,  and  run  in  a  dedicated  partition 
of  a  multi -programming  machine. 

For  this  environment,  it  must  be  highly  diagnostic  and  deal 
with  errors  in  terms  which  can  be  understood  by  a  beginning  programmer, 
rather  than  merely  producing  an  ABEND  code  and  a  hexadecimal  dump.  It 
must  be  very  secure  so  that  no  user  can  destroy  any  part  of  the  processor 
itself,  or  cause  the  processor  to  affect  other  jobs  or  the  supervisor  in 
the  host  machine.  Further,  as  most  student  jobs  are  only  compiled  until 
they  are  debugged,  and  then  are  executed  just  once,  the  compile  time 
must  be  extremely  fast,  even  though  execution  times  may  be  relatively 
slow  in  favour  of  increased  diagnostic  capability.  The  specifications 
of  syntax  and  semantics  should  be  straightforward  and  high-level,  and, 
although  most  concepts  in  present  day  languages  of  the  PL/I  type  should 
be  accessible  at  this  high-level,  the  generator  must  still  be  extensible 
to  incorporate  new  innovations  of  the  language  designer.  Finally, 
the  additional  requirement  exists  that  the  system  be  craated  in  finite 
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time  by  a  relatively  small  group,  such  that  a  prototype  system  can 
be  operating  after  two  or  three  man-years  of  work. 

These  basic  requirements  have  several  immediate  effects  on  the 
implementation  scheme  for  the  system.  The  languages  to  be  initially 
supported  are  FORTRAN  IV,  Algol,  and  a  rich  subset  of  PL/I,  as  well  as 
various  simplified  dialects  of  these  for  teaching  purposes.  Thus, 
program  structure  of  the  type  found  in  Algol  or  Pl/I  has  to  be  intrinsi¬ 
cally  supported,  as  do  corresponding  concepts  of  scope  for  symbols, 
recursive  procedures,  data  types  and  at  least  primitive  structures,  such 
as  arrays,  and,  of  course,  a  standard  set  of  arithmetic,  logical  and 
character  operations  together  with  some  (albeit  basic)  input/output 
facility.  Some  macro  capability  should  be  incorporated  to  permit  the 
expansion  of  the  pseudo-machine  in  terms  of  its  primitive  operations. 

In  addition,  the  generated  compiler  should  be  in  a  form  easily  modifiable 
by  a  user;  in  other  words,  in  some  source  language  or  at  least  a  high- 
level  pseudo-machine  language,  rather  than  simply  in  a  hexadecimal 
machine  language. 

To  facilitate  the  creation  of  the  system,  and  to  permit  it  to 
be  maintained  and  modified  by  other  than  the  original  programmers,  it 
was  decided  to  write  all  parts  of  it  in  a  high-level  language.  Further, 
it  was  decided  that  a  logical  extension  of  this  policy  was  to  have  the 
generated  compilers  also  produced  as  source  programs  in  a  high-level 
language . 

The  actual  base  language  chosen,  the  subdivision  of  the  processor 
generation  process,  and  the  design  constraints  on  the  target  language  and 
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emulator  are  now  discussed,  along  with  a  more  detailed  study  of  the 
actual  implementation,  in  the  following  chapter. 

3.2  BASE  LANGUAGE 

In  choosing  a  language  in  which  to  write  the  processor  system, 
none  of  the  many  existing  languages  was  considered  to  be  completely 
satisfactory.  The  most  powerful  considered  was  PL/I,  which  is  a  well-known 
language  and  for  which  a  working  compiler  exists.  However,  PL/I  is  rather 
too  complex  for  the  needs  of  this  project,  and  is  certainly  slow  in 
compilation  and  execution  compared  to  other  languages  available.  Further, 
at  the  inception  of  this  project,  in  late  November  1969,  the  PL/I 
compiler  available  from  IBM  was  not  reliable  and  it  was  felt  that  a  poor 
base  language  compiler  could  easily  delay  the  project  for  many  months. 

The  only  other  existing  language  seriously  considered  was  XPL,  a  PL/ I 
subset  developed  at  Stanford  University,  which  has  a  compiler  written 
in  XPL  itself  which  produces  /360  object  machine  code.  This  language  was, 
in  fact,  devised  for  compiler  writing  and  has  been  shown  to  be  extremely 
practical  for  this  by  Wortman  [WO 1 ]  with  his  SPL  language  batch  processor. 
It  is  much  faster  at  both  compilation  and  execution  than  PL/I  and  is 
more  efficient  in  its  core  utilization.  It  does,  however,  have  several 
disadvantages  in  comparison  to  PL/I. 

The  three  most  serious  disadvantages  of  XPL  are  that  there  is  a 
limit  of  1024  character  strings  in  any  program,  that  no  recursion  in 
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procedures  is  permitted,  and  that  there  is  no  link  edit  capability  between 
XPL  object  modules  themselves  or  between  XPL  modules  and  standard  OS/360 
modules.  The  only  solution  was  to  create  our  own  base  language  (just  as 
XPL  was  created  to  write  SPL) ,  but  this  would  delay  the  project.  Therefore, 
the  base  language  chosen  was  a  modified  XPL  with  at  least  the  three  defici¬ 
encies  mentioned  above  corrected;  in  the  meantime,  the  project  would  be 
written  partly  in  PL/I  and  partly  in  XPL,  and  the  minor  conversions  to 
the  new  XPL-plus  or  BPL  made  when  this  language  became  operational.  It 
was  also  decided  that,  for  universality,  the  emitted  processors  would 
be  in  BPL  source  language.  Thus,  all  phases  of  the  project  have  yielded 
programs  in  a  language  compilable  to  /360  object  language  for  maximum 
efficiency  at  execution. 

However,  for  reasons  already  discussed,  the  generated  processors 
emit  a  pseudo  machine  object  code,  and  this  is  discussed  in  a  later 
section  in  this  chapter. 

3.3  SUB-DIVISION  OF  THE  PROCESSOR 


The  compilation  process  can  naturally  be  divided  into  four 
sections:  scanner,  parser,  symbol  table  routines,  and  code  generator. 
Source  statements  are  scanned  and  tokens  passed  to  the  parser,  which 
then  recognizes  syntactic  constructions  and  activates  the  code  emission 
routine  for  a  given  syntactic  production.  The  symbol  table  routines  are 
used  by  all  three  other  sections;  symbols  are  loaded  in  the  table  when 
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scanned,  attributes  are  added  during  parsing,  and  this  information  is 
used  by  the  code  generator.  In  the  automatic  generation  of  a  compiler, 
this  natural  division  seemed  the  logical  one  to  use  in  specifying  the 
syntax  and  semantics  of  a  language  and  the  peculiarities  of  the  compiler. 
Thus  the  generation  of  a  language  processor  has  been  broken  into  four 
major  steps,  with  the  construction  and  execution  of  a  separate  BPL 
program  (or  programs)  for  each  step. 

These  steps,  or  phases,  are  identified  as  the  parse  phase, 
the  scan  phase,  the  semantics  phase,  and  the  merge  phase.  The  first 
three  generate  tables  which  are  combined  with  a  framework  compiler- 
emulator  in  the  fourth  phase  to  produce  the  final  language  processor. 

The  parse  phase  receives  the  syntax  of  the  language  and  produces  tables 
which  are  used  to  drive  a  parsing  algorithm  contained  in  the  compiler 
framework.  The  scan  phase  takes  as  input  the  specifications  for  the 
syntax  of  non-terminals  left  as  terminals  in  the  syntactic  specification 
for  the  parser  (such  things  as  <identifier>  and  <string>)  and  produces 
tables  for  a  scanning  algorithm.  The  input  to  the  semantic  phase 
defines  the  symbol  table  format  and  the  semantics  to  be  associated  with 
each  syntactic  production.  The  various  algorithms  and  specification 
formats  for  each  of  these  phases  will  now  be  considered  in  the  order  in 
which  they  are  executed. 
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3.4  PARSE  PHASE 

The  specification  for  the  user  language  for  which  a  compiler 
is  to  be  generated  is  done  either  in  Backus  Naur  form,  or  in  a  shortened, 
column-dependent  version  of  BNF  used  by  the  XPL  Analyzer  program.  This 
format  is  the  standard  used  throughout  the  field  of  computer  science, 
and  no  attempt  at  finding  an  alternate  method,  such  as  syntax  trees,  was 
successful  in  yielding  a  better  technique.  However,  the  algorithm  to  be 
used  in  parsing  was  considerably  less  obvious. 

Table-driven  parsers  for  bounded  context  languages,  such  as 
Gries  transition  matrices  [GR1] ,  were  possible,  and  programs  can  be 
and  have  been  written  to  generate  these  tables. 

However,  recent  work  by  DeRemer  [DR1]  on  a  DPDA  parser  for 
LALR[k)  languages  seemed  more  promising  and  it  was  this  technique  which 
was  finally  chosen.  A  much  deeper  discussion  of  this  subject  is  presented 
by  another  member  of  this  project  in  his  thesis  [LAI]'  and  so  no  further 
discussion  of  the  parser  or  its  table  generator  will  be  given.  It  should 
be  pointed  out,  however,  that  the  parse  phase  must  be  executed  first,  as 
it  is  to  the  syntax  of  the  language  that  all  semantics  are  oriented. 
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3.5  SCAN  PHASE 


The  simplest  scanner  is  one  which  passes  the  terminals  of  a 
language  character  by  character,  directly  to  the  parser.  However,  this 
forces  the  rules  for  the  formation  of  such  things  as  <identifier> , 

<string>,  and  <number>  into  the  BNF  syntax  and  hence  increases  the  size 
of  the  parsing  tables  and  drastically  reduces  compile  speed.  It  may  also 
introduce  local  ambiguities  which  complicate  the  parsing  process.  To 
circumvent  this,  ad  hoc  techniques  for  doing  some  parsing  in  the  scanner 
are  normally  employed,  such  as  the  recognition  of  identifiers,  character 
strings  and  numbers,  so  that  these  non-terminals  may  be  passed  directly  to 
the  parser.  The  scanner  is  also  used  to  remove  delimiting  blanks  and 
comments  prior  to  parsing.  The  scanning  process  can  be  modelled  by  the 
action  of  a  sequential  machine  whose  possible  inputs  consists  of  the  set  of 
all  characters  possibly  present  in  source  statements,  and  whose  outputs 
consist  of  the  set  of  all  terminals  of  the  parser,  and  a  null  output.  For 
example,  the  Mealy  machine  below  will  recognize  integer  numbers  delimited 
by  blanks : 


The  best  way  of  formalizing  such  a  process  is  to  write  the  regular 
expressions  for  the  strings  recognized  by  the  machine,  a  technique  used  in 
the  AED  system  developed  at  MIT  [RSI]1.  Thus,  to  continue  the  example  of 
integers  delimited  by  blanks,  the  following  regular  expression  defines  all 

1  The  scanner  specifications  described  in  this  thesis  were  developed  without 
knowledge  of  those  in  the  AED  system  but  bear  a  remarkable  similarity  to 
them. 


. 


22 


such  strings: 

INTEGER  =  n  n*  )6  where  n  =  {0,1,--, 9} 

#  =  {blank} 

There  is  one  drawback  to  using  regular  expressions  to  define 
the  terminals  of  a  parser:  no  provision  is  made  for  setting  upper  and 
lower  bounds  on  the  repetition  operator  In  most  computer  languages, 

a  practical  bound  must  be  placed  on  the  lengths  of  all  tokens;  for 
example,  the  XPL  identifier  is  restricted  to,  at  most,  256  alphanumeric 
characters,  and  must  start  with  an  alphabetic  character.  The 
expression 

IDENTIFIER  =  a  (a  |  n)* 

must  be  modified  to  include  these  limits,  yielding 

IDENTIFIER  =  a  (a  |  n)^55 

This  last  format  is  the  one  adopted  for  the  specification  of  the 
scanner's  operation,  as  it  was  considered  to  be  the  most  straightforward 
means  of  describing  the  tokens  to  be  detected  by  the  scanner.  All 
parser  terminals  are  defined  in  terms  of  expanded  regular  expressions 
of  string  names,  and  these  strings  are  then  themselves  defined.  The 
details  of  this  process  are  given  in  the  next  chapter,  but  it  should  be 
noted  that  it  is  left  to  the  user  to  decide  the  relative  complexities 
of  the  parser  and  the  scanner. 

An  interesting  project  would  be  the  automation  of  this 
division  process  to  provide  the  ultimate  in  recognition  speed  for  the 
emitted  compiler.  This  is  one  of  several  aspects  of  optimization  being 
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considered  by  the  SLAP  project,  but  it  is  beyond  the  scope  of  this 
thesis  to  deal  with  it. 

More  important  than  optimization,  however,  is  the  human 
engineering  aspect  of  designing  the  scanner  generator,  and  indeed  this 
concern  is  paramount  in  the  design  of  all  aspects  of  this  project.  The 
generating  system  must  be  easy  to  use  and  must  converse  with  the  user 
in  a  style  as  natural  to  the  user  as  possible.  At  each  stage,  the  user 
should  be  told  exactly  what  he  has  instructed  the  program  to  do.  This 
may  be  quite  different  from  what  he  wanted  to  do.  This  philosophy  is 
evident  throughout  the  design  of  this  processor  generating  system. 

3.6  SYMBOL  TABLE  ORGANIZATION 


All  variable  names,  labels,  procedure  names  and  constants 
are  stored  by  compilers  in  a  symbol  table.  The  efficiency  of  the  routines 
which  access  and  modify  the  table  affect  the  overall  performance  of  the 
compiler  to  a  great  extent,  and  the  decisions  made  in  choosing  a  symbol 
table  scheme  are  not  obvious.  In  fact,  it  is  evident  that,  although 
a  sufficiently  general  data  base  for  a  symbol  table  can  be  devised, 
the  algorithms  necessary  to  perform  all  of  the  manipulations  that  a 
language  designer  might  desire  can  not  be  provided.  Symbol  table 
semantics  are  therefore  split  into  two  parts:  the  structure  of  the 
table  itself,  and  the  routines  used  to  manipulate  it. 
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The  structure  of  the  table  is  one  of  several  trees  connected 
in  a  structured  link  system  corresponding  to  the  standard  lexical  nested 
scope  structure  of  symbols  used  in  Algol  and  PL/I.  Binary  trees  are 
used  rather  than  linear  lists  or  hash  tables  because  the  trees  are 
considered  to  offer  adequate  access  times  for  the  typical  case  of  a  half¬ 
full  table  as  well  as  the  easiest  dumps  so  necessary  in  a  compiler.  The 
nodes  of  each  tree  consist  of  two  links  and  an  arbitrary  number  of  attri¬ 
butes  whose  type  and  reference  name  the  user  specifies.  Accesses  to  a 
symbol’s  attributes  are  made  using  this  reference  name,  and  diagnostics 
related  to  the  symbol  table  use  these  names  when  reporting  such  things  as 
invalid  or  conflicting  attribute  assignments.  There  are  some  symbol 
attributes  that  are  automatically  defined  by  the  generator  and  used  by  the 
emitted  processor,  such  as  the  number  of  appearances  of  an  entry.  In 
keeping  with  the  ideals  of  extensibility  and  alterability ,  these  automatic 
features  are  replaceable,  although  such  modifications  must  be  made  to  the 
generating  programs . 

The  concept  of  a  system  of  lexically  linked  trees  was  devised 
to  accomodate  the  most  general  nature  of  symbol  scope.  Nodes  in  trees 
are  permitted  to  reference  nodes  in  more  global  trees;  this  corresponds 
to  the  referencing  of  local  variables  to  global  ones.  The  linked  struc¬ 
ture  of  the  trees  permits  a  dump  of  all  active  symbols  to  be  given  for 
diagnostic  purposes,  while  the  use  of  separate  trees  for  each  distinct 
program  level  permits  very  efficient  searches  to  be  made  for  local 
variables.  If  pointers  to  global  variables  are  inserted  in  the  local 
tree,  search  time  is  still  much  better  than  that  for  one  large  tree 
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whose  nodes  include  the  lexic  level  of  the  symbols  represented. 

The  few  examples  given  above  of  symbol  table  attributes  all 
pertain  to  the  compilation  phase,  although  the  symbol  table  is  maintained 
at  execution  for  the  purposes  of  diagnostics.  There  is,  of  course,  a 
most  necessary  aspect  of  the  symbol  table  that  has  not  yet  been 
mentioned:  namely,  the  references  to  actual  memory  used  to  store  the 
values  of  the  symbols.  Suffice  it  to  say  here  that  the  storage  references 
are  treated  simply  as  attributes  (which  they  are);  the  choice  of  the 
actual  addressing  mechanism  and  memory  allocation  is  discussed  in  a 
later  section. 

The  routines  needed  for  manipulating  the  symbol  table  are  all 
straightforward.  They  include  traversal  procedures  (for  such  things  as 
symbol  dumps),  search  and  insertion  procedures,  and  attribute  assignment 
and  recovery  procedures  for  individual  trees,  and  procedures  to  link 
together  the  headnodes  of  the  trees  in  the  specified  lexically  ordered 
structure . 

The  invocation  of  these  various  procedures  is  specified  in 
the  semantics  metalanguage  by  a  set  of  keyword-directed  statements. 

The  actual  format  of  these  statements  and  their  consequences  are 
described  in  the  next  chapter.  Provision  was  made  in  the  semantics 
specification  program  to  allow  the  user  to  define  his  own  procedures  to 
supplement  those  initially  supplied,  although  this  must  involve  actually 
modifying  the  source  programs  of  the  semantics  table  generator  and  the 
framework  processor,  for  no  macro  facility  is  planned. 
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3.7  MEMORY  ALLOCATION 


The  scheme  used  for  the  allocation  of  memory  for  storing 
variables  and  constants  during  the  execution  of  programs  being  run  on  a 
generated  processor  affects  the  design  of  the  symbol  table,  code  emission 
routines  in  the  compiler  phase  and  the  pseudo-machine,  and  this,  of 
course,  affects  those  parts  of  the  generator  that  produce  the  tables 
for  these  routines.  Therefore,  a  mechanism  must  be  devised  that  is 
simple  and  efficient,  yet  flexible  so  that  a  user  can  add  his  own 
structures  to  the  storage  system.  Recursion  is  to  be  supported,  so  a 
direct  access  via  the  symbol  table  is  impossible.  As  well  as  simple 
recursion,  and  the  implicit  re-allocation  of  local  variables  that  this 
entails,  the  ability  to  explicitly  allocate  and  free  variables  (as  in 
PL/I)  is  also  desired.  Finally,  the  data  itself,  or  at  least  an 
immediate  pointer  to  it,  should  be  self-describing  as  to  its  type,  any 
history  of  how  the  value  was  acquired,  and  other  attributes  that  will 
possibly  differ  for  different  allocations  of  the  ’same*  (at  compile-time) 
variable . 

To  achieve  these  goals,  the  decision  was  made  to  separate  the 
declaration  and  allocation  operations.  The  declaration  of  a  variable 
merely  alters  its  attributes  in  the  symbol  table  (all  variables  are 
placed  in  the  currently  active  symbol  table  by  the  scanner  when  they 
first  appear),  and  this  is  done  as  the  program  is  compiled.  The 
allocation  of  a  variable,  which  occurs  at  execution,  reserves  the  amount 
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and  type  of  variable  storage  necessary  and  updates  the  pointer  from  the 
symbol  table,  through  which  all  accesses  to  variables  are  made.  The 
pointers  themselves  are  actually  maintained  as  a  linked  list,  so  that 
at  the  explicit  or  implicit  freeing  of  a  variable,  its  storage  pointer 
is  corrected. 

This  scheme  can  be  contrasted  to  the  common  (lexic  level,  order 
number)  pair  addressing  used  by  Randell  and  Russell  in  their  Algol  60 
compiler  [RA1].  This  latter  technique  permits  the  allocation  of  all 
local  variables  in  a  procedure  (or  re-allocation  for  a  recursive  invo¬ 
cation)  by  updating  a  single  base  address,  which  is  certainly  more 
economical,  both  in  memory  requirements  and  time  necessary,  than  the 
former  proposal  in  its  execution.  Moreover,  it  probably  does  not  require 
any  more  overhead  at  compilation,  since,  although  the  total  number  of 
local  variables  in  a  program  segment  delimited  by  the  bounds  of  their 
scope  must  be  maintained  for  use  in  assigning  a  subsequent  base  address, 
this  is  offset  by  the  extra  code  emitted  for  each  allocation  to  be 
performed. 

As  to  execution  efficiency,  if  one  restricts  the  languages 
processed  to  those  like  Algol  or  XPL  in  that  allocation  is  an  intrinsic 
part  of  procedure  entry  and  is  not  known  explicitly,  then  the  pair 
addressing  scheme  is  again  superior.  However,  the  controlled  variable 
concept  of  PL/I  is  implicit  in  the  initial  proposal.  The  accessing 
mechanism  is  actually  simpler  in  design  in  this  scheme  as  well,  merely 
because  the  display  stack  is  unnecessary.  The  symbol  table  references  a 
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stack  which  maintains  the  lists  of  descriptors  (actual  data  pointers  and 
attributes)  for  the  existing  copies  of  each  variable’s  values.  The  return 
of  memory  (at  the  end  of  a  procedure,  for  example)  can  be  done  quite 
simply  by  virtue  of  the  ability  to  pair  operators  at  the  beginning  and 
end  of  a  program  segment.  This  feature  is  discussed  in  the  sections  on 
code  generation.  The  only  other  aspect  of  memory  allocation  is  that  of 
garbage  collection  which  becomes  necessary  only  with  the  repeated  allo¬ 
cation  and  freeing  of  several  variables  in  some  intermixed  order  within 
a  given  scope.  However,  this  problem  is  inherent  in  any  scheme  allowing 
this  type  of  dynamic  memory  allocation  and  must  be  handled  at  execution 
time  by  the  pseudo-machine. 

As  a  final  note  on  the  matter,  it  is  very  significant  that  the 
extra  overhead  required  by  the  more  general  addressing  scheme  is  apparent 
only  when  executing  a  program  compiled  by  the  generated  processor.  The 
role  of  the  processor  is  that  of  a  batch  processor  for  student  jobs, 
which  necessitates  extremely  fast  compilation  speed  but,  since  only  about 
20%  (a  rather  small  proportion)  of  jobs  are  actually  executed,  does  not 
require  the  same  sort  of  speed  for  execution.  Moreover,  the  linked  list 
approach  for  several  copies  of  values  of  a  given  variable  permits  these 
values  to  be  easily  tabulated  for  diagnostic  purposes  at  run  time 
should  an  error  occur,  thus  yielding  a  trace  of  the  usage  of  each  variable. 
In  view  of  the  requirements  for  a  student  job  processor,  the  proposed 
allocation  scheme  is  considered  superior  to  that  of  Randell  and  Russell 
for  this  project. 
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3.8  OBJECT  MACHINE  ARCHITECTURE 


There  are  four  main  requirements  for  the  processors  generated 
by  this  system:  fast  compilation  speed,  excellent  diagnostics  at  both 
compilation  and  execution,  security  of  the  processor  in  that  source 
programs  submitted  to  it  can  never  destroy  it,  and  a  high  degree  of 
maintainability  and  compatibility.  When  considering  the  object  code  to 
be  emitted  by  the  compiler  phase  of  the  processor  and  then  executed, 
all  of  these  criteria  lead  to  the  adoption  of  a  pseudo-machine  as  the 
target  of  the  compiler. 

Excellent  compilation  speed  is  achieved  by  simplifying  the 
code  emitters  and  reducing  the  amount  of  code  produced.  This  can  be 
done  by  designing  a  pseudo-machine  optimally  suited  to  the  input  source 
language,  which  usually  involves  the  use  of  powerful  operations  at  a 
similarly  high  level  to  the  source  statements.  This,  of  course,  decreases 
the  actual  number  of  object  machine  instructions  emitted  if  a  given  code 
can  do  increasingly  more.  As  well,  more  work  can  be  left  to  the  execution 
phase  of  job  processing,  especially  in  the  area  of  error  detection  and 
recovery.  This  results  in  less  checking  at  compile  time,  and,  therefore, 
faster  compilations.  In  fact,  the  compiler’s  error  detection  is  limited 
almost  exclusively  to  syntax  checking.  Recalling  that  about  eighty  per 
cent  of  jobs  submitted  are  only  compiled  and  not  executed,  this  has  the 
overall  effect  of  increasing  job  throughput.  Since  most  error  detection 
of  a  semantic  sense  is  done  at  execution  time,  a  pseudo-machine  permits 
much  easier  checking  and  better  (and  easily  expandable)  diagnostic 
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capability  than  if  direct  machine  code  were  emitted  and  then  executed. 
Finally,  a  common  emulator,  if  well  designed,  can  be  used  by  many 
compilers  and  thus  compatible  object  code  and  a  uniform  technique  of 
error  handling  are  provided.  For  all  of  these  reasons,  and  with  the 
bonus  of  simpler  initial  prototype  completion,  the  object  machine  was 
chosen  to  be  a  pseudo-machine  rather  than  the  actual  hardware. 

The  machine  itself  is  relatively  straightforward.  Instructions 
are  stored  in  segments,  which  are  strings  of  sequentially  executed 
operation  codes.  Special  operations  in  a  segment  can  invoke  another 
segment  which  is  then  executed.  At  completion  of  a  called  segment, 
control  returns  to  the  calling  segment.  There  is  also  a  simple  branch 
instruction  to  transfer  directly  into  another  segment.  Besides  these 
invocation  and  branch  operations,  a  set  of  zero-operand  instructions 
are  defined  which  operate  on  the  one  accumulator  stack  in  the  machine. 
Thus,  computations  are  performed  much  like  those  in  the  Burroughs  B5500 
computer  [Bl|l].  Additional  commands  in  a  segment  are  used  to  invoke 
selector  lists,  which  perform  computed  branches  or  invocations  on  a 
list  of  segments,  the  argument  for  the  selection  being  taken  from  the 
accumulator  stack.  A  detailed  study  of  all  of  these  operations  is 
provided  in  the  next  chapter. 

One  area  of  the  pseudo-machine's  architecture  that  is,  perhaps, 
neglected  with  this  design  is  its  input/output  capability.  Two  segment 
operations  provide  for  the  reading  of  an  input  record  into  the 
accumulator  stack,  and  for  the  writing  of  an  output  record  from  the  top 
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of  the  stack.  More  powerful  I/O  features,  such  as  files,  formats  on 
input  and  output,  and  others  are  to  be  added,  but  this  is  an  aspect  of 
this  project  beyond  the  scope  of  this  present  work. 


3.9  SEMANTICS  FOR  OBJECT  CODE  EMISSION 


As  was  stated  above,  the  architecture  of  the  pseudo-machine 
was  designed  to  facilitate  the  operation  of  the  compiler.  To  do  this, 
its  object  programs  have  been  devised  to  resemble  the  syntactic  structure 
of  the  source  program,  for  it  is  the  parsing  of  this  input  program  that 
is  used  to  trigger  the  code  emitters.  The  process  is  very  similar  to 
that  used  in  compilers  generated  with  the  SKELETON  proto-compiler  of 
the  XPL  system  [MK1] ,  with  the  two  notable  differences  being  the  adoption 
of  Lalonde's  parser  [LAI]  and  the  use  of  table-driven  code  emitters  rather 
than  actual  in-line  routines  to  do  this.  Care  must  be  taken  to  distinguish 
the  two  programs  concerned  with  these  semantics:  the  first  being  that 
part  of  the  processor  generation  procedure  that  translates  the  semantic 
specifications  for  each  syntactic  production  into  tables;  and  the  second 
being  that  part  of  the  compiler  phase  of  the  processor  skeleton  that 
actually  synthesizes  the  object  code.  In  considering  this  process,  the 
semantics  specifications  can  be  considered  a  source  program  that  is 
compiled  by  the  first  program  to  produce  the  semantics  tables  as  object 
code,  and  the  code  emitter  routine  of  the  proto-processor  can  be 
considered  as  a  pseudo-machine  that  executes  this  'object  code'. 
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The  dominant  feature  of  the  entire  compilation  phase  is  the 
parsing  of  the  source  program.  This  operation  structures  the  source 
into  a  tree  whose  shape  reflects  the  organization  of  the  object  program, 
and  it  was  this  operation  that  inspired  the  use  of  program  segments 
generated  at  compile  time  and  then  executed  by  the  emulator.  In  most 
compilers,  be  they  written  manually  or  automatically,  code  is  emitted 
sequentially  and  pointers  must  be  maintained  to  make  later  corrections 
to  the  previous  code.  This  is  necessary  to  effect  the  re-ordering  of 
the  execution  sequence  of  a  program  and  to  resolve  branch  instructions 
emitted  before  their  targets  were  known.  However,  if  a  program  segment 
is  associated  with  each  syntactic  element  of  a  program,  then  just  as 
applying  a  production  results  in  the  replacement  of  some  number  of 
syntactic  tokens  by  one,  so  too  can  it  result  in  the  linking  of  the 
program  segment  associated  with  the  component  tokens  to  form  a  new 
segment  corresponding  to  the  resultant  token. 

The  actual  linking  is  not  restricted  to  the  sequential 
combination  established  in  syntactic  reduction;  rather,  it  is  permitted 
to  order  the  linking  in  any  sequential  way,  with  the  insertion  of  any 
pseudo-machine  operations,  or  to  order  the  segments  with  sublists  that 
permit  computed  or  selective  execution  at  object-time.  The  actual 
format  for  specifying  the  linking  of  segments  for  a  given  production  is 
discussed  in  the  next  chapter. 
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3.10  ERROR  RECOVERY 

Although  this  section  has  been  left  to  the  end  of  this  chapter, 
it  is  by  no  means  the  least  important.  Rather,  the  design  criteria  for 
this  system  -  that  is  be  easy  to  use  to  generate  processors,  and  that 
the  processors  themselves  be  ultra-diagnostic  in  both  compilation  and 
execution  -  have  influenced  the  development  from  the  beginning.  The 
actual  methods  used  will  not  be  tabulated  here;  this  is  left  to  the 
detailed  analyses  of  the  programs  themselves;  but  a  resume  of  the  basic 
philosophy  of  error  handling  is  in  order. 

The  discussion  of  error  recovery  can  be  made  with  regard  to 
the  processor  generation  routine  as  well  as  the  operation  of  the  finished 
processor.  During  the  execution  of  the  generation  routines,  error 
detection  is  limited  to  invalid  syntax  on  the  input  specification  state¬ 
ments  and,  where  checking  is  possible,  invalid  semantics,  such  as 
references  to  a  procedure  that  does  not  exist  for  the  language  being 
specified.  While  the  error  message  given  in  such  a  case  is  as  perspi¬ 
cuous  as  possible,  and  the  faulty  card  in  the  input  stream  is  identified, 
error  recovery  in  the  sense  of  assuming  default  values  or  of  trying  to 
make  some  sense  out  of  a  syntactically  incorrect  card  is  not  often 
attempted.  Such  techniques  are  advisable  for  student  jobs  where  an 
attempt  should  be  made  to  generate  as  much  diagnostic  information  as  is 
possible,  and  this  can  be  done  by  pushing  execution  of  the  program  until 
no  more  assumptions  can  be  justified.  However,  the  specification  of  a 
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language  must  be  very  precise,  so  in  practice  it  seems  ill-advised  to 
attempt  to  employ  such  default  mechanisms  in  the  generation  stage.  As 
an  additional  attempt  at  averting  erroneous  processors,  the  specifications 
read  by  all  phases  of  the  generation  process  are  restated  in  English 
sentences  to  show  the  user  exactly  what  he  has  directed  the  processor  to 
do  in  a  given  situation,  or  the  exact  definition  of  some  entity  he  has 
described.  In  this  way,  the  user  will  discover  cases  where  what  he  said 
and  what  he  meant  to  say  are  entirely  different,  while  without  such  output 
an  error  might  go  unnoticed  for  quite  some  time  or,  if  its  effects  were 
apparent,  might  not  itself  be  discovered  for  some  time.  These,  then,  are 
the  main  considerations  in  error  handling  at  generation  time. 

In  the  operation  of  the  language  processor  itself,  there  are 
two  types  of  errors  that  are  detected:  syntactic  errors,  which  are 
found  during  compilation,  and  semantic  errors,  which  are  usually  not 
found  until  execution.  A  third  type  of  error,  namely  exceeding  the 
capacity  of  any  table  in  the  processor,  is  actually  a  deficiency  in  the 
processor  and  a  message  to  that  effect  is  produced.  However,  it  is  the 
first  two  types  of  errors  that  are  caused  by  a  fault  or  faults  in  the 
user's  program,  and  these  are  the  cases  of  interest. 

When  a  syntactic  error  occurs,  the  programmer  should  be 
informed  of  exactly  where  the  error  occurred,  which  token  was  the  last 
one  picked  up  by  the  parser,  and  then  be  given  a  dump  of  the  parse 
stack  at  the  time  of  the  error.  Recovery  from  such  an  error  should  be 
attempted  if  only  to  permit  the  remainder  of  the  source  program  to  be 
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syntactically  checked.  The  actual  mechanisms  for  recovery  are  discussed 
in  the  thesis  describing  the  parser  [LAI] . 

Semantic  errors  are  probably  the  more  difficult  to  deal  with, 
especially  to  the  depth  desired  for  a  teaching  language  processor.  The 
programmer  should  be  informed  of  the  exact  nature  of  the  error  and  have 
the  location  in  the  program  given  in  reference  to  his  source  program  and 
not  the  object  program.  Such  a  reference  should  yield  the  exact  line 
number  and  ideally  the  type  of  statement  involved.  The  variables 
specifically  involved,  if  any,  should  have  their  current  values  given  as 
well  as  where  they  acquired  those  values  and  possibly  any  hidden  values 
that  have  been  stacked  due  to  re-allocation,  either  explicit  or  due  to 
recursion.  A  trace  of  the  previous  several  statements  and  their  line 
positions  should  be  given,  as  well  as  all  currently  active  local  and 
global  variables.  Finally,  any  recovery  attempted  should  be  described, 
such  as  the  assigning  of  default  values.  In  addition  to  all  these  aids 
to  the  programmer,  the  language  designer  has  the  ability  to  modify  the 
output  given  at  error  detection  to  add  even  more  diagnostics  or  delete 
some  existing  ones  he  feels  are  unnecessary.  As  well,  various  trace 
facilities  are  provided  at  both  compilation  and  execution  so  that  a  user 
may  examine  the  sequence  leading  to  an  error  he  cannot  otherwise  under¬ 
stand.  Once  a  semantic  error  has  been  diagnosed,  it  should,  whenever 
possible,  be  circumvented  so  as  to  allow  execution  to  continue.  However, 
since  each  error  will  generate  so  many  diagnostics  and  since  any  error 
will  necessitate  resubmission  of  a  student  job,  the  number  of  such 
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recoveries  is  limited.  As  a  final  comment,  it  should  be  clear  that 
such  complete  diagnostic  capability  demands  that  a  great  many  pointers, 
counters  and  in-line  checkpoint  instructions  be  present.  However,  the 
modelling  of  the  object  program  structure  after  the  source  program  struc¬ 
ture  permits  reconstruction  of  the  source  from  the  object.  This  is  parti¬ 
cularly  useful  in  reconstructing  expressions,  since  the  code  generated 
for  them  is  identical  to  Polish  postfix  notation,  and,  hence,  can  be 
converted  to  fully  parenthesized  form. 

The  discussion  of  the  actual  implementation  of  this  project 
does  not  deal  with  error  handling  per  se,  but  rather  incorporates  it  as 
an  integral  part  of  the  analysis  of  each  program,  which  it  is;  this  is 
presented  in  the  next  chapter. 
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CHAPTER  4 


IMPLEMENTATION 


4.1  PARSING  TABLES  GENERATION 


The  creation  of  a  language  processor  must  naturally  start  with 
the  definition  of  the  language  involved.  The  first  step  is  the  specifi¬ 
cation  of  the  syntax  of  the  language  and  the  subsequent  construction  of  a 
parser  for  it,  together  with  the  verification  that  the  language  is  not 
ambiguous  and  is  of  a  type  parsable  by  the  chosen  algorithm.  In  this 
system,  the  syntax  of  a  language  is  given  in  Backus  Naur  form  [BA1]  or 
in  a  similar,  but  column-dependent,  format  referred  to  as  Analyzer  form 
because  of  its  initial  use  by  the  ANALYZER  program  of  XPL  [MK1] .  This 
syntax,  prefaced  by  a  record  detailing  the  options  to  be  used,  and 
particularly  the  type  of  input  format  chosen,  is  fed  to  the  parsing 
table  generator  program  of  W.R.  Lalonde  [LAI]  which  constructs  transition 
and  output  tables  for  a  finite-state  machine  that  is  the  actual  parser. 

The  parser  itself  is  described  in  a  later  section.  The  actual  format  of 
the  tables,  and  the  details  of  their  generation,  are  well  documented  in 
the  reference  given  above  and  shall  not,  therefore,  be  given  here. 

A  second  set  of  tables  is  generated  from  the  syntactic  specifi¬ 
cation  of  the  language  solely  for  use  in  subsequent  phases  of  the  processor 
generation.  These  tables  allow  the  construction  of  a  production  in  terms 
of  the  vocabulary  initially  used,  and  this  can  not  be  done  directly  from 
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the  parsing  tables.  There  are  two  tables:  the  first  contains  pointers 
to  a  list  for  each  production  and  the  lists  themselves;  the  second  gives 
the  length  of  the  list  for  each  production.  Since  all  subsequent  specifi¬ 
cations  for  code  emission  for  a  production  are  referenced  to  the  syntax 
by  the  production  number,  these  tables  permit  a  record  to  be  printed 
showing  both  the  original  syntax  and  the  associated  semantics.  This  is 
very  necessary  as  a  safeguard  to  detect  the  association  of  semantics  with 
the  wrong  syntax.  Both  of  these  tables  are  passed  as  temporary  data  sets 
and  are  used  in  subsequent  steps,  as  illustrated  above.  Ultimately,  only 
the  parsing  tables  are  saved  when  they  are  emitted  as  part  of  the 
completed  processor,.*  The  tables  referenced  are  illustrated  in  Fig.  4.1. 

4.2  SCAN  TABLES  GENERATION 


In  most  cases,  a  parser’s  terminals  consist  not  only  of  keywords 
or  special  symbols,  which  are  constructed  from  the  character  set  to  be 
used  to  write  source  programs  for  the  processor,  but  also  of  non-terminals 
whose  detection  is  left  to  the  scanner.  A  common  example  is  the 
<identifier> ,  a  non-terminal  found  in  most  teaching  languages.  While 
some  languages  (notably  Algol  60  [R01])  specify  the  syntax  of  <identifier> 
as 

<identifier>  : :  =  <letter>  |  <identifier>  <letter>  |  <identifier>  <digit> 
and  then  go  on  to  specify  (see  page  40) 
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Figure  4.1 


Supplementary  Parse  Tables 


These  tables  are  used  to  store  the  syntactic  productions  for 
diagnostic  purposes.  Their  generation  is  discussed 
in  section  4.1,  and  their  use,  in  section  4.3. 

P#  is  the  total  number  of  syntactic  productions. 

VOCAB  is  the  character  array  of  tokens  used  in  the  syntax. 
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<letter>  : :  =  a  |  b  |  c  |  .  |y|z 

<digit>  : :  =  0  |  1  |  .  |  8  |  9 

most  languages  leave  this  to  the  scanner.  This  is  necessary  when  the 
number  of  recursive  productions  is  to  be  limited,  as  in  FORTRAN  IV  where 
the  length  of  an  identifier  can  not  exceed  six  alphanumeric  characters . 
These  considerations,  which  were  discussed  in  the  previous  chapter,  led 
to  the  modified  regular  expression  format  described  there,  and  it  is 
the  second  phase  of  the  processor  generation  that  builds  tables  used  to 
drive  the  scanning  algorithm  which  is  a  standard  part  of  the  emitted 
processor.  The  operation  of  the  scanner  is  described  later;  this  section 
deals  only  with  the  table  generation. 

The  input  to  the  scan  table  generating  program  consists  of 
definitions  of  the  terminal  "non-terminals"  of  the  syntax  (those  tokens 
having  the  format  of  non-terminals,  but  appearing  only  on  the  right  side 
of  productions)  in  terms  of  regular  expressions  in  character  sets,  and 
the  definition  of  these  character  sets.  The  two  types  of  definitions  may 
be  intermixed;  tables  of  each  are  built  simultaneously  and  resolved  and 
checked  for  completeness  after  all  of  the  input  records  have  been  read. 

An  additional  feature  of  the  program  is  the  specification  of  a  character 
string  to  signal  the  end  of  a  job  in  batch  processing.  This  string  is 
used  by  the  scanning  algorithm  in  the  emitted  compiler  to  define  the 
end-of-job,  and  to  prevent  the  compiler  from  reading  beyond  the  limits 
of  one  job  into  the  next. 

The  operation  of  the  program  is  perhaps  best  explained  by 
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a  formal  description  of  the  input  to  it,  followed  by  an  example.  As  was 
mentioned  in  Chapter  3,  the  tokens  to  be  recognized  by  the  scanner  can 
be  described  by  restricted  regular  expressions1  in  character  strings, 
using  a  modified  repetition  operator.  The  syntax  of  such  regular  expres¬ 
sions  used  to  define  non-terminals  can  itself  be  described  by2 

{non-terminal  definition}  ::=  {Tio'n-terminal }  {term  list} 

{term  list}  ::=  {term}  |  {term  list}  §  {term} 

{term}  : :=  {repetition  operator}  {factor  list} 

{repetition  operator}  ::=  {number}  TO  {number}  |  EXACTLY  {number} 

|  AT  MOST  {number}  |  AT  LEAST  {number} 

{factor  list}  : :=  {factor}  |  {factor  list}/ {factor} 

{factor}  : :=  {string}  |  {string} 

and,  for  compiler  compatibility, 

{non-terminal }  ::=  <  {non-terminal  name}  >  . 

Note  that  is  used  as  the  catenate  operator,  and  "/"  as  the  union  operator. 

Both  <string>  and  <non-terminal  name>  are  strings  of  upper-case  characters. 

The  adaptation  of  this  formalism  to  a  card-oriented  (or  in  genera},  input 

record  oriented)  format  will  be  given  in  a  moment,  but  first  the  syntax  of 

the  statements  defining  the  strings  is  given: 

{string  definition}  : :=  {string}  {match  type}  {character  string} 

{translation  string} 

{match  type}  : :=  ALL  [  ANY 

1  The  reader  who  is  unfamiliar  with  regular  expressions  may  refer  to  FINITE- 
STATE  MODELS  FOR  LOGICAL  MACHINES:  F.C.  Hennie,  Chapter  5,  Wiley  $  Sons, 

New  York,  1968,  or  any  similar  text; 

2  Note  that  here  the  "{}"  are  used  rather  than  the  usual  "<>"  brackets  to 
avoid  later  confusion  with  the  use  of  "<"  and  ">"  as  terminals. 
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{character  string}  ::=  @  (actual  string}  @ 

{translation  string}  : :=  {null}  |  =>  @  {actual  string}  @ 

where  {actual  string}  is  a  character  string  which  contains  any  valid 
EBCDIC  character3. 

Definitions  of  the  two  types  above  form  the  input  to  the 
scanner.  The  first  element  of  either  definition  must  be  left- justified 
and  completed  within  one  input  record,  except  that  for  a  non-terminal 
definition,  the  term  list  is  not  written  using  "  but  rather  by  having 
one  term  following  the  non-terminal  on  the  first  card,  followed  by  as 
many  input  records  as  there  are  extra  terms,  each  record  containing  one 
term  that  must  not  be  left- justified . 

The  meaning  of  these  definitions  is  straightforward.  A  non¬ 
terminal  is  defined  as  a  sequential  list  of  terms  catenated  together, 
with  each  term  defined  as  a  lower  and  upper  bounded  repetition  of  a  factor 
list.  The  factor' list  is  defined  as  being  any  of  the  factors  in  it,  so 
that  each  repetition  of  the  factor  list  can  select  a  different  factor,  if 
sufficient  factors  are  there.  The  factor  is  either  a  character  string 
represented  by  a  string  name,  or  the  complement  of  the  character  string. 

The  string  itself  is  either  any  of  the  actual  characters  in  the 
string  or  all  of  them,  depending  on  the  match  type.  A  translation  string 
is  used  to  perform  a  replacement  of  a  string  forming  a  factor  of  a  recognized 

5  The  character  must  be  represented  by  n@@",  following  the  example  of 

single  quotes  within  PL/I  character  strings. 
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non-terminal  by  a  new  string  before  the  scanner  passes  the  non-terminal 
to  the  parser. 

To  give  an  example,  a  character  string  as  in  PL/I,  but 
limited  for  implementation  purposes  to  32  characters,  can  be  recognized 
by  the  scanner  with  the  following  definitions: 

<STRING>  EXACTLY  1  QUOTE 

AT  MOST  32  DQUOTE  /  -?  QUOTE 
EXACTLY  1  QUOTE 

QUOTE  ALL  =>  @@ 

DQUOTE  ALL  @  »  »  @  =>  @  *  @ 

The  tables  created  for  this  input  will,  when  coupled  with  the  scanning 
algorithm  described  later,  recognize  all  character  strings  delimited  by 
single  quote  marks  and  replace  imbedded  pairs  of  quote  marks  by  single 
quote  marks.  The  delimiting  single  quotes  are  not  passed  to  the  parser 
as  part  of  <STRING>. 

In  addition  to  defining  tokens  for  the  parser,  special  tokens 
may  be  defined  which  are  trapped  by  the  scanner  and  not  passed  to  the 
parser.  Two  such  tokens  are  used  at  present,  although  more  may  be  easily 
added  by  modifying  a  list  in  the  scan  table  generating  program.  The  first 
is  <NULL>,  which  may  be  used  to  sluff  comments  and  delimiting  strings  of 
blanks.  The  second  is  <TOGGLE>,  which  is  used  to  allow  the  scanning 
algorithm  to  pass  strings  to  a  special  routine  in  the  emitted  processor 
to  control  compiler  options  such  as  listing,  traces  of  code  emission,  and 
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others  that  a  user  may  add.  However,  the  use  or  addition  of  such  trapped 
tokens  will  be  left  to  the  discussion  of  the  table-driven  scanner. 

Once  the  input  stream  is  exhausted,  two  checks  for  completeness 

are  made.  The  first  is  that  all  tokens  of  the  form  "< . >M  left  as 

terminals  by  the  syntax  specification  have  now  been  defined  in  the  scanner 
specification.  The  second  is  that  all  strings  used  in  defining  tokens 
have  themselves  been  specified.  As  these  checks  are  made,  tables  are 
finalized  prior  to  the  output  of  a  record  of  what  the  scanning  algorithm 
will  recognize  when  coupled  with  the  tables,  and  the  emission  of  the 
tables  themselves. 

The  first  printed  output  of  the  scan  table  generator  is  a  trace 
of  input  records.  Any  errors  in  syntax  are  detected  here  and  in-line 
messages  produced.  The  second  output  printed  is  a  restatement  of  the 
definitions  of  all  parser  tokens  to  be  detected  by  the  scanner,  but 
instead  of  defining  them  in  terms  of  string  names,  the  tokens  are  defined 
directly  in  terms  of  the  character  strings.  Each  definition  appears  as 
an  English  sentence,  rather  than  in  the  terse  format  of  the  input 
records.  Thus  the  previous  definition  of  <STRING>  results  in  the  output: 


A  <STRING>  IS  RECOGNIZED  AS  EXACTLY  1  OF  WHICH  TRANSLATES  TO  "" 

FOLLOWED  BY  NO  MORE  THAN  32  OF  ",,M  WHICH  TRANSLATES  TO 

OR  NOT  M,M  [ANYTHING  ELSE  IS  PASSED 


UNCHANGED) 


' 
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FOLLOWED  BY  EXACTLY  1  OF  WHICH  TRANSLATES  TO 

In  addition,  the  end  of  program  marker,  if  any,  is  output  now. 

The  next  two  tables  specify  head  character  decisions  for  tokens 
and  trapped  tokens,  but  these  shall  be  discussed  as  part  of  the  scanning 
algorithm. 

The  final  output  is  a  list  of  the  tables  produced,  the  use  of 
which  is  again  left  to  the  discussion  of  the  table-driven  scanner  in  a 
later  chapter.  Their  formats  are  also  given  there,  in  Fig.  4.3,  to 
illustrate  their  use. 

The  tables  are  actually  emitted  (as  are  all  tables  produced 
by  the  processor  generator)  as  a  series  of  declaration  statements  in 
the  base  programming  language  with  initial  attributes  specifying  the 
contents  of  the  tables.  These  declaration  statements  are  merged  with 
the  source  program  of  the  proto-processor  to  form  a  complete  processor; 


this  action  is  described  in  section  4.8. 


-- 


: 


46 


4.3  SEMANTICS  TABLES  GENERATION 


The  third  phase  of  the  processor  generation  produces  three 
distinct  sets  of  tables:  those  which  serve  as  a  data  base  and  define 
the  attributes  stored  in  the  symbol  table;  those  that  define  the 
library  of  user-supplied  routines,  if  any,  that  may  be  referenced  by 
the  semantic  routines  of  the  compiler  in  the  emitted  proto-processor; 
and  those  that  drive  the  code  emitters  and  other  semantic  routines 
invoked  when  a  syntactic  production  is  applied.  Since  the  three  sets  of 
tables  do  interact,  they  are  created  in  the  order  as  shown  above  by 
one  generating  program;  however,  the  input  specifications  for  them  are 
separate  and  each  table  is  handled  within  the  main  program  by  a  distinct 
procedure1.  The  purpose  of  each  set  of  specifications  and  the  format  of 
these  are  discussed  separately;  the  tables  produced  and  the  procedures 
that  use  them  are  described  (and  illustrated  where  applicable)  in  a  later 
section . 

4.3.1  SYMBOL  TABLE 


The  definition  of  the  structure  of  the  symbol  table  is  the 
first  set  of  specifications  processed.  The  structure  discussed  in  Section 
3.6  is  implemented,  and  the  necessary  declaration  statements  in  the  base 
programming  language  are  emitted  to  form  the  data  base  for  the  tree  of 

headnodes  as  described.  This  superstructure  for  the  symbol  table  structure 

1  for  convenience,  the  three  groups  are,  at  the  time  of  writing,  separated 

by  control  cards  and  read  as  one  file. 
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is  illustrated  in  Fig.  4.4,  included  in  section  4.7  on  the  emulator. 

The  various  symbol  table  binary  trees  which  store  the  local  variables 
of  a  particular  scope  of  some  subset  of  the  source  program  are  all  kept 
in  one  set  of  four  arrays,  hereafter  called  the  symbol  table.  The  size 
of  the  symbol  table  [representing  the  total  number  of  elements  that  may 
be  s'tored  in  all  the  trees)  is  set  by  a  literal  declaration  in  the 
semantics  table  generation  program.  Two  arrays  are  dimensioned  the  size 
of  the  symbol  table  and  are  used  to  store  the  left  and  right  links  of 
the  trees  that  form  the  symbol  table.  The  other  two  arrays,  one  of  type 
FIXED  and  one  of  type  CHARACTER,  are  used  to  store  all  attributes  of  the 
symbols1.  The  size  of  these  tables  depends  on  the  number  and  type  of  the 
user-declared  attributes  of  the  symbols,  if  any.  The  declaration  of 
these  attributes  is  the  only  specification  of  the  symbol  table  structure 
by  the  user,  although  any  aspect  of  the  table  or  its  superstructure  can 
be  modified  by  modifying  the  generating  program.  Each  attribute  is 
specified  on  a  separate  input  record;  the  syntax  of  this  input  is2: 
<attribute  definition  ::=  <attribute  name>  <type> 

<type>  ::=  FIXED  <fixed  initial>  |  CHARACTER  <character  initial> 
<fixed  initial>  : :=  <null>  |  INITIALLY  <integer> 

<character  initial>  : :=  <null>  |  INITIALLY  <string> 

1  The  base  programming  language,  like  XPL,  only  has  variable  types  FIXED, 
CHARACTER  and  BIT(n).  For  simplicity  all  attributes  have  been  forced  to 
be  FIXED  or  CHARACTER,  thus  allowing  storage  of  integers  as  fixed  words, 
and  character  strings  of  length  zero  to  256  bytes. 

2  Expressed  in  the  usual  BNF 
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As  well  as  any  user-specified  attributes,  the  generating 
program  automatically  declares  the  attributes  "NAME",  "STATUS"  and 
"NO.  OF  APPEARANCES"  of  types  character,  character  and  fixed,  respectively. 
These  attributes  are  necessary  as  the  scanning  algorithm  has  in  it  routines 
to  load  all  terminals  of  the  type  left  as  <••••>  by  the  parser  into  the 
symbol  table,  setting  these  attributes  when  the  terminal  is  loaded  and 
modifying  the  "NO.  OF  APPEARANCES"  attribute  on  subsequent  appearances.  If 
an  initial  value  is  not  specified  for  an  attribute,  default  values  of  zero 
or  null  string  are  assumed.  This  is  also  done  if  an  invalid  specification 
is  made,  such  as  an  alphabetic  character  in  the  initial  value  for  a  FIXED 
attribute.  Once  all  user-defined  attributes  are  read,  the  total  number  of 
elements  required  by  each  of  the  two  attribute  arrays  is  known,  this  being 
simply  the  attribute  count  for  the  appropriate  type  multiplied  by  the  total 
number  of  symbols.  Each  attribute  is  assigned  an  offset  which  is  used  in 
all  references  to  it  by  the  routines  accessing  the  symbol  table  for  a  given 
symbol's  attribute.  Thus,  to  find  the  fixed  attribute  m  of  symbol  number  n, 
the  computation  is  identical  to  that  for  a  two  dimensional  array  reference 
(m,n);  namely,  the  value  of  m  multiplied  by  the  symbol  table  size,  plus  n 
yields  the  proper  index  of  the  fixed  attribute  array.  A  discussion  of  the 
routines  that  can  be  invoked  by  the  emitted  compiler  from  its  code  genera¬ 
tion  procedure,  or  by  the  emulator  during  execution  of  a  compiled  program, 
is  left  to  later  sections  on  these  respective  subjects. 

The  tables  (in  "the  form  of  declaration  statements)  emitted  for 
the  symbol  table  and  its  related  structure  include  the  four  actual  symbol 
table  arrays  discussed  above.  Also  emitted  are  tables  (again  as  declaration 
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statements)  of  the  attribute  names,  types,  initial  values  and  reference 
numbers  for  indexing  purposes,  and  various  declarations  for  global 
variables  used  in  symbol  table  procedures  resident  in  the  proto-processor, 
including  literal  declarations  for  the  number  of  elements  in  the 
symbol  table  and  the  number  of  attributes  for  each  symbol.  In  addition, 
as  was  stated  earlier,  an  array  is  declared  for  storing  the  tree  of 
headnodes  necessary  to  permit  different  symbol  table  trees  to  store 
symbols  known  on  different  lexic  levels  of  a  program.  The  attributes 
of  the  symbol  table,  and  the  purpose  of  the  tables  emitted,  are  respeci¬ 
fied  in  English  as  was  done  for  the  scanning  tables  to  ensure  that  the 
user  realizes  precisely  the  form  that  he  has  specified  for  the  symbol 
table,  as  well  as  a  brief  restatement  of  its  general  organization. 

4.3.2  USER-SUPPLIED  ROUTINES 


To  permit  a  user  to  invoke  his  own  semantics  routines  during  the 
compilation  of  jobs  by  the  processor  being  created,  a  library  of  the  names 
of  these  procedures  is  necessary,  as  shall  be  seen  below  when  the  mechanism 
of  their  invocation  is  given.  The  recording  of  any  such  procedures  is  done 
by  simply  placing  each  procedure’s  name  on  a  separate  source  record. 

As  the  file  of  these  names  is  read,  they  are  recorded  in  a 
character  array.  Later,  during  execution  of  the  semantics  table  generation 
program,  these  names  are  printed,  together  with  the  numeric  code  by  which 
they  are  known  internally  in  the  compiler.  The  tables  emitted  for  them 
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are  simply  a  literal  declaration  of  the  number  of  such  routines,  and  a 
declaration  statement  for  a  character  array  initialized  to  these  names, 
which  is  used  for  possible  diagnostic  purposes  in  the  compiler  section  of 
the  emitted  processor. 

4.3.3  SEMANTICS  ASSOCIATED  WITH  SYNTAX 

The  previous  two  sub-sections  have  dealt  with  specifications  that 
are  global  in  nature;  that  is,  they  are  not  referenced  to  any  specific 
productions  in  the  syntax  of  the  language  to  be  compiled  by  the  emitted 
processor.  The  last  category  of  semantic  specifications  which  is  now 
described  covers  all  of  the  actions  of  the  processor's  compiler  that 
are  performed  upon  application  of  any  syntactic  reduction  by  the  parser. 
Included  here  are  all  code  emission,  symbol  table  manipulation,  invocation 
of  special  routines  supplied  by  the  user,  and  parse  stack  manipulation 
independent  of  that  done  by  the  parser.  All  such  operations  are  attached 
to  a  specific  syntactic  production,  and  as  many  operations  as  required 
may  be  specified  for  a  given  production. 

Each  type  of  semantic  specification  (there  are  eight  at  the 
time  of  writing  this  thesis)  is  described  separately,  and  the  syntax  of 
each  is  given  with  it.  However,  the  general  structure  of  the  input  should 
be  described  first. 

Ultimately,  a  format  permitting  the  specification  of  syntax  and 
semantics  together  for  each  syntactic  production  is  planned,  but  in  this 
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implementation  the  semantics  is  specified  separately  from  the  syntax, 
which  makes  the  operation  of  the  processor  generator  simpler,  as  each 
of  these  two  basic  language  specifications  is  used  as  input  by  separate 
phases.  Therefore,  marker  records  are  used  to  link  the:.various  semantic 
specification  blocks  to  their  syntactic  specifications.  This  is  done 
using  the  number  of  each  syntactic  production,  as  described  below. 

The  first  type  of  semantic  specification  is  the  "P"  type  which 
is  used  to  specify  a  production  number.  Its  syntax  is: 

production  semantics>  :  :=  <p-head>  production  number> 

<p-head>  : :=  P  |  PRODUCTION 

The  syntactic  production  specified  is  the  one  to  which  all  subsequent 
semantics  are  attached  until  a  new  "P"  record  is  read.  If  no  production, 
or  an  invalid  production,  is  specified,  subsequent  semantics  (except 
production  specifications)  are  ignored. 

The  next  logical  group  to  discuss  is  that  dealing  with  the 
semantics  of  code  emission.  These  are  the  "S",  "A",  "L"  and  "G"  types. 
They  are  all  used  to  specify  the  linking  or  generation  of  program  segments 
by  the  compiler.  As  was  discussed  in  the  previous  chapter,  all  syntactic 
entities  have  associated  with  them  program  segments.  When  a  syntactic 
production  is  applied  and  these  tokens  are  collapsed  into  one  token,  a 
similar  semantic  production  is  usually  required  and  the  routines  these 
specifications  invoke  at  compile  time  handle  this. 

The  simplest  of  these,  and  probably  the  most  often  used,  is  the 
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"S",  or  sequential  type.  Its  syntax  is: 

sequential  semantics>  : :=  <s-head>  <target>  <link>  <list> 

<s-head>  : :=  S  |  SEQUENTIALLY 
<target>  ::=  <null>  |  AT  <token> 

<link>  ::=  <null>  |  LINK 

<list>  : :=  <list  element>  |  <list>  ,  <list  element> 

<list  element>  : :=  <token>  |  <op> 

The  syntactic  production  to  which  this  semantics  refers  consists  of  a 
left-side  token  which  is  the  goal  of  the  production,  and  on  the  right  side 
a  series  of  tokens,  be  they  terminals  or  non-terminals.  If  tokens  are 
numbered  sequentially,  then  the  goal  symbol  is  number  zero,  the  first  token 
on  the  right  side  of  the  production  is  number  one,  etc.  Thus,  for 
SYNTAX  :  <0>  ::=  <1>  <2> 

SEMANTICS  :  S  2,  1 

the  application  of  the  syntactic  production  will  cause  the  program  segment 
associated  with  the  syntax  element  '<  2>  to  be  linked  sequentially  to  the 
program  segment  associated  with  <  1>  and  the  resultant  segment  to  be  associ¬ 
ated  on  the  parse  stack  with  the  token  <0>,  as  the  default  for  the  target 
token  is  zero.  In  addition,  pseudo-machine  operation  code  mnemonics  may 
appear  in  the  list  so  that 
S  2,  1,  ADD 

would  produce  a  new  program  segment  consisting  of  the  old  segment  2,  followed 
by  segment  1,  followed  by  the  zero-operand  instruction  ADD.  Other  special 
op  codes  may  be  inserted,  but  these  are  left  to  the  discussion  on  the  code 


emission  routines  and  the  emulator. 
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The  next  type  of  semantics  is  the  "A",  or  alternate,  type. 

This  is  used  to  build  a  list  of  segments  to  be  executed  selectively,  the 
selection  being  determined  by  the  current  value  at  the  top  of  the 
accumulator  stack  in  the  pseudo-machine.  Its  syntax  is: 

<alternate  semantics?*  :  :=  <a-head>  <target>  <link>  <list> 
<a-head>  : :=  A  |  ALTERNATELY 
<target>  : :=  <null>  [  AT  <tokeu> 

<link>  ::=  <null>  |  LINK 

<list>  : :=  <list  element>  |  <list>  ,  <list  element> 

<list  element>  ::=  <token>  |  <op> 

If  this  semantic  action  is  invoked  at  compile  time  in  the  emitted 
processor,  a  list  of  segment  addresses  is  built  and  a  new  segment 
containing  a  special  op  code  is  created  which  points  to  this  list. 

It  is  the  address  of  this  new  segment  that  is  associated 
with  the  target  syntax  element.  If  the  target  token  is  not  specified, 
zero  is  assumed.,  This  structure  is  illustrated  in  Fig.  4.5  which 
shows  a  compiled  program  in  the  memory  of  the  pseudo-machine  (section 
4.7). 

The  "L",  or  link,  type  is  used  for  recursive  productions 
where  an  alternate  linkage,  rather  than  sequential,  is  desired.  It  is 
used  to  extend  the  list  of  an  existent  alternate  selection  instruction. 


Its  syntax  is: 
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<link  semantics>  ::=  <l-head>  <target>  <add>  <list> 

<l-head>  : :=  L  |  LINK 
<target>  ::=  <null>  |  TO  <token> 

<add>  : :=  <null>  |  ADD 

<list>  : :=  <list  element>  |  <list>  ,  <list  element> 

<list  element>  : :=  <token>  |  <op> 

and  when  the  semantics  specified  by  this  record  is  performed  by  the 
compiler,  the  segment  associated  with  the  target  syntax  element  must 
be  of  the  "A"  type.  As  before,  a  null  target  specification  defaults 
to  zero. 

The  last  of  this  group  is  the  "G",  or  group,  type,  which 
is  reserved  to  specify  any  actions  concerned  with  the  program  instruc¬ 
tion  stack.  It  is  not  currently  used,  but  was  built  into  the  system 
and  has  been  left  in  case  an  extra  type  of  semantics  is  necessary  in 
this  area. 

The  next  set  of  semantics  deals  with  the  invocation  of  routines 
which  may  emit  code  but  are  primarily  used  to  manipulate  the  parse  stacks 
and  the  symbol  table.  The  "T"  type  handles  symbol  table  operations, 
including  those  involving  operations  on  the  tree  of  symbol  table  tree 
headnodes.  The  operations  specified  are  all  keyword  oriented,  and 
include  statements  to  set  attributes  from  the  auxiliary  parse  stack, 
to  conditionally  modify  them,  as  well  as  add  symbols  and  perform  dumps. 
Two  statements  are  used  for  the  headnode  tree,  these  being  an  up  and 
a  down  command.  Each  down  command  begins  a  new  tree;  each  up  command 
reactivates  the  parent  tree  and  de-activates  the  old  tree  for  the 
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remainder  of  the  compilation.  Instructions  are  emitted  for  this  oper¬ 
ation  so  that  the  headnode  of  the  currently  active  tree  is  known  to 
the  emulator  at  execution  of  the  compiled  program,  but  this  aspect  is 
more  appropriately  left  to  the  discussion  on  the  table-driven  code 
amitter  and  emulator  later  in  this  chapter.  The  actual  format  of  the 
various  "T"  commands  is  not  given  here  because  they  are  currently  under 
review.  The  present  set  is  given  in  the  appendices. 

The  other  type  in  this  group,  the  "D",  or  stack,  type  is 
used  to  copy  parse  stack  values  to  the  auxiliary  parse  stack  mentioned 
above.  This  allows  all  the  tokens  of  recursive  syntactic  productions 
(such  as  declares  -  hence  the  "D")  to  be  stacked  and  then  a  single 
operation  to  be  applied  to  them.  This  semantic  type  is  also  used  to 
define  operations  involving  the  parse  stack,  specifically  the  modification 
of  any  of  the  parse  stacks  other  than  that  containing  the  actual  tokens 
used  by  the  parsing  algorithm;  there  should  be  no  necessity  to  alter 
this  stack.  The  need  for  this  latter  operation  is  not  seen  at  the 
moment,  however,  so  no  such  operations  are  actually  definable;  however, 
they  may  easily  be  added  if  later  desired. 

Finally,  there  is  the  "R",  or  routine,  type  of  semantics. 

This  is  the  safety  valve  for  the  semantics  routines.  Its  syntax  is 
extremely  simple: 

<routine  semantics>  : :=  <r-head>  <routine  name> 

<r-head>  ::=  R  ]  ROUTINE 

If  a  language  designer  can  not,  or  does  not  wish  to,  express  his  semantics 
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for  a  production  with  the  means  provided,  he  merely  specifies  a  routine 
type  of  semantics  for  that  production,  specifying  a  procedure  from  the 
library  of  user-defined  routines  (see  section  4.3.2).  He  then  inserts 
his  own  routines  (as  many  as  he  has  declared  in  the  library) ,  written 
in  the  base  programming  language,  into  a  well-labelled  spot  in  the 
proto-processor  and  at  execution  of  the  processor’s  compiler  his 
routine  is  invoked.  It  is  admitted  that  this  goes  right  back  to  the 
method  employed  in  XPL  [MK1]  but,  as  the  semantics  language  is  enriched, 
this  feature  should  become  less  frequently  used,  although  it  will  always 
have  to  remain,  ’’just  in  case”. 

This  concludes  the  specification  of  all  types  of  currently 
supported  semantics,  but  the  construction  of  tables  has  been  ignored.. 
However,  they  are  quite  simple  and  many  similarities  exist. 

There  are  actually  only  two  tables,  both  BIT(16)  arrays.  They 
are  illustrated  in  Fig.  4.2.  The  first,  called  SEMANTICS,  is  indexed 
by  the  production  number  and  a  pointer  is  obtained  to  a  list  maintained 
in  the  same  array.  There  is  one  element  in  the  list  for  each  semantics 
record  specified  for  the  given  production.  Each  element  contains  two 
items  of  information  packed  together:  the  type  of  semantics,  and  a 
possible  pointer  to  an  auxiliary  array  called  SYNTHES I ZE_SEGMENT,  which 
stores  lists  of  all  information  necessary  to  complete  the  semantics 
specification.  The  information  in  these  tables  is  printed  in  English, 
following  each  syntactic  production  for  which  semantics 
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Figure  4.2 


Semantics  Tables 
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PRODUCTION  NUMBER 
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SYNTHESIZE  SEGMENT 
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LAST  SEMANTICS 


These  tables  are  used  to  store  the  semantics  associated  with 
syntactic  productions.  Their  generation  is  described  in 
section  4.3.3,  and  their  use,  in  section  4.6. 

SEMANTICS  and  SYNTHES I ZE_SEGMENT  are  both  BIT(16)  arrays. 
LAST_S EMANT ICS  and  LAST_SYNSEG  mark  their  sizes. 

P#  is  the  total  number  of  syntactic  productions. 
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have  been  specified,  and  the  tables  are  emitted. 


4.4  TABLE- DRIVEN  PARSER 


The  parsing  algorithm  contained  in  the  proto-processor  consists 
of  the  procedures  COMPILATION_LOOP ,  RECOVER  and  CONFLICT.  The  latter  two 
are  used  for  taking  remedial  action  when  a  syntax  error  is  detected  and 
for  convenience,  respectively.  The  first  embodies  the  deterministic  push¬ 
down  automata  (DPDA)  whose  action  is  controlled  by  the  parsing  tables 
emitted  by  the  generating  program  described  earlier  in  this  chapter.  The 
details  of  the  machine's  operation  are  described  in  Lalonde's  thesis  [LAI], 
and  shall  not  be  detailed  here.  The  operation  of  the  RECOVER  routine  is 
very  crude,  consisting  of  discarding  one  token  and  retrying  the  parse,  and, 
if  this  fails,  of  scanning  to  something  "hard"  in  the  source  before  retrying 
the  parse.  This  is  adopted  from  the  XPL  compiler  [MK1]  and,  although  not 
considered  satisfactory  for  this  project,  does  provide  some  means  of 
recovery.  A  better  procedure  is  under  consideration  but  will  not  be 
discussed  in  this  thesis. 

4. 5  TABLE-DRIVEN  SCANNER 

The  scanning  algorithm  is  encoded  in  two  procedures,  called 
SCAN,  and  LITTLE_SCAN.  The  operation  of  the  scanner  is,  of  course, 
controlled  by  the  tables  produced  in  an  earlier  phase  and  then  merged 
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with  these  procedures  in  the  final  processor.  In  addition  to  these  two 
procedures,  there  are  also  procedures  used  to  fill  the  buffer  (SCBUF) 
that  the  scanner  uses  and  to  sluff  characters  as  they  are  recognized 
from  the  buffer  and  to  print  the  source  listing.  The  operation  of  the 
scanner,  as  will  be  seen,  pre-supposes  a  large  full  buffer,  and  this  may 
require  reading  many  records  ahead.  However,  if  these  records  are  listed 
as  they  are  read  and  a  syntax  error  is  discovered  involving  items  at  the 
head  of  the  buffer,  then  any  error  message  produced  is  displaced  from  the 
card  to  which  it  refers.  This  is  the  reason  for  the  source  listing  being 
controlled  by  the  routine  which,  in  effect,  shifts  the  buffer  and 
replenishes  it  as  tokens  are  passed  to  the  parser.  The  process  can  be 
viewed  as  a  series  of  dependent  procedures,  as  in  a  pipeline,  with  the 
parsing  routine  COMPILATION_LOOP  calling  SCAN  (which  may  call  LITTLE_SCAN, 
as  is  discussed  below)  which,  when  a  token  is  recognized,  calls  SLUFFSCBUF 
which  may  call  PRINTCARD  and  then  calls  FILLSCBUF  which  may  call  GETCARD. 
The  mnemonic  names  of  each  procedure  should  serve  to  identify  its  purpose, 
and  the  operation  of  those  managing  the  buffer  and  performing  I/O  need 
not  be  described  because  of  their  simplicity.  However,  the  strategy 
employed  in  scanning  will  be  explained. 

The  SCAN  routine  takes  the  first  character  in  the  buffer 
and  uses  its  hexadecimal  value  to  index  into  the  table  SCANCHAR.  The 
value  found  there  classifies  the  character  as  illegal,  ambiguous,  a  one- 
character  terminal,  or  the  unique  head  character  of  a  terminal,  or 
of  a  non-terminal  whose  identification  is  left  to  the  scanner.  An 
illegal  character  is  sluffed,  and  an  error  message  is  emitted.  A  length 
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one  terminal  is  sloughed  from  the  buffer  and  returned  to  the  parser.  If 
the  character  is  unique  and  the  head  character  of  a  terminal,  the  proper 
length  substring  of  the  buffer  is  checked  against  that  terminal.  No  match 
causes  the  character  to  be  re-classified  as  illegal,  while  a  match  causes 
the  return  of  the  entire  terminal  and  its  removal  from  the  buffer.  If  the 
character  is  unique  and  the  head  character  of  a  particular  terminal  to  be 

recognized  by  the  scanner  (that  is,  of  the  type  . . >) ,  then  that 

terminal’s  reference  number  is  passed  to  LITTLE_SCAN  which  decides 
whether  that  terminal  exists  in  the  buffer.  If  it  returns  to  SCAN  without 
identifying  a  token,  then  the  character  is  re-classified  as  illegal. 

Finally,  if  the  character  is  ambiguous  (that  is,  it  is  the  head  character 
of  two  or  more  terminals)  it  is  checked  against  all  scanner-recognized 
terminals  using  LITTLE_SCAN.  If  LITTLE_SCAN  recognizes  a  token,  then  that 
token  is  checked  against  all  terminals  to  prevent,  for  example,  keywords 
being  returned  as  identifiers,  and  then  the  token,  or  terminal,  is  sloughed 
from  the  buffer  and  returned  to  the  parser.  If  no  match  is  made,  then  the 
appropriate  substring  of  the  buffer  is  checked  in  reverse  order  (to  find 
the  longest  matches  first)  against  all  terminals.  A  match  returns  that 
terminal  as  described  previously;  a  failure  causes  re-classification  of  the 
character  as  illegal.  As  was  noted  in  the  discussion  earlier  on  the 
semantics  tables  (and,  in  particular,  the  symbol  table)  generation,  routines 
exist  in  SCAN  to  load  a  scanner-identified  terminal  (that  is,  of  the  type 

. . >)  into  the  symbol  table,  or  find  its  address  if  it  already  exists, 

and  set  its  initial  attributes  or  modify  its  existing  ones.  In  parti- 
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Figure  4.3 


Scan  Tables 
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These  tables  are  used  to  drive  the  scanning  algorithm  of  the  emitted 
processor.  Their  generation  is  discussed  in  section  4.2,  and  their 
use,  in  section  4.5.  In  addition  to  the  above  tables,  the  head 
character  decision  array  SCANCHARC255)  BIT(16)  is  produced. 
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cular,  on  the  first  appearance  the  name  attribute  is  set  to  the  character 
string  BCD  returned  by  LITTLE  SCAN,  the  status  attribute  is  set  to  the 
parser  token  (such  as  <IDENTIFIER>)  and  the  number  of  appearances  attribute 
is  set  to  1,  while  on  subsequent  appearances  the  only  action  is  to  increment 
the  number  of  appearances  attribute. 

The  procedure  LITTLE_SCAN  can  be  thought  of  as  a  finite  state 
machine  driven  by  the  scanning  tables  constructed  from  the  specifications 
for  the  scanner  discussed  earlier.  The  procedure  has  one  argument;  this 
argument  is  the  terminal  whose  presence  is  to  be  compared  to  the  contents 
of  the  scan  buffer.  The  purpose  of  each  array  in  the  algorithm  is  explained 
by  tracing  the  execution  of  the  procedure. 

The  calling  argument  indexes  into  the  array  GOAL  to  obtain  the 
identification  number  of  the  terminal  that  will  be  returned  in  TOKEN 
should  the  test  be  successful.  The  arrays  GLENGTH  and  GOFFSET  are  also 
indexed  by  the  calling  argument  to  obtain  the  number  and  the  first 
location,  respectively,  of  the  terms  to  be  checked  for;  these  are  described 
in  the  following  arrays,  the  parallel  elements  of  which  correspond  to  one 
term  in  an  expression  for  the  given  terminal.  GMAX  and  GMIN  give  the 
upper  and  lower  bounds  for  the  repetition  of  the  term;  GSYM  and  GSYML 
give  the  first  of,  and  the  number  of,  the  alternate  strings  to  be  tested 
for.  The  value  obtained  from  GSYM,  and  as  many  succeeding  values  as 
specified  in  SGYML,  are  used  as  indices  into  GSTRING,  which  has  two  items 
of  information  packed  in  each  element.  These  are  the  index  of  the  string 
to  be  used  for  the  test,  and  the  absence  or  presence  of  the  symbol  in 
the  initial  definition,  thus  signifying  the  use  of  the  string  itself  or  of 
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its  complement.  The  index  obtained  from  GSTRING  references  SCANSYM,  a 
character  array  containing  the  actual  strings,  and  SCANSYMTYPE,  a  binary 
array  defining  the  type  of  match  to  be  performed:  all  or  any. 

The  reference  to  a  finite  state  machine  should  be  clarified. 

The  operation  of  LITTLE_SCAN  involves  several  nested  loops.  The  outermost 
represents  the  progression  from  term  to  term,  and  should  be  thought  of  as 
a  state.  Within  this  exists  a  loop  that  may  be  executed  GMAX  times,  each 
loop  adding  one  to  the  number  of  returns  to  that  state.  To  complete  the 
loop,  a  match  must  be  obtained  between  the  contents  of  the  buffer,  indexed 
by  a  pointer  SCANPTR,  and  one  of  the  strings  from  the  list  for  that  term. 
These  tests  are  made  by  inner  loops  that  can  be  regarded  as  sub-states 
of  the  global  state.  At  the  first  failure  for  all  string  matches,  the 
number  of  loops  successfully  performed  is  compared  to  GMIN.  If  at  least 
GMIN  loops  have  been  done,  or  once  the  outer  loop  has  been  done  GMAX  times, 
the  next  state  is  entered,  corresponding  to  the  next  term  being  tested. 
During  this  time,  an  output  buffer  XTOKEN  is  built  up,  so  that  any  trans¬ 
lations  of  the  input  stream  indicated  in  the  scanner  specifications  can 
be  made  and  if  a  successful  recognition  of  the  terminal  under  consideration 
occurs,  the  appropriate  string  can  be  returned  instantly.  Alternately, 
should  a  mismatch  occur,  the  contents  of  the  actual  scan  buffer  have  not 
been  altered  as  is  desired.  If  a  match  does  occur,  it  is  LITTLE_SCAN 
that  calls  SLUFFSCBUF  to  remove  the  recognized  string  from  the  scan 
buffer. 

The  use  of  special  trapped  terminals  and  the  reserved  terminal 
<NULL>  was  discussed  in  the  section  on  scanning  table  generation.  If 
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LITTLE_SCAN  returns  the  terminal  <NULL>,  then  SCAN  simply  calls  LITTLE_SCAN 
again  and  ignores  the  string  returned  in  BCD  (which  will  be  replaced  by  the 
second  call) .  However,  if  the  recognized  terminal  is  one  of  those  not 
used  by  the  parser  and  a  special  routine  has  been  defined  for  it,  then 
that  routine  is  called;  otherwise,  an  error  message  is  given.  The 
terminal  is  then  re-classified  as  a  <NULL>  before  LITTLE  SCAN  returns  it, 
so  that  it  is  ignored  by  SCAN. 

In  addition  to  the  recognition  of  terminals  and  the  deletion  of 
blanks  (if  appropriate)  and  comments,  the  scanner  provides  the  mechanism 
for  the  batch  processing  of  student  jobs.  Two  variables  are  used  for 
this  purpose:  the  end  of  file  and  end  of  block  switches  (EOFSW  and  EOBSW) . 
The  end  of  file  switch  is  set  off  in  the  main  initialization  procedure  of 
the  processor,  and  is  turned  on  when  the  procedure  FILLSCBUF  detects  an 
end  of  file  condition  on  the  input  batch  stream.  The  end  of  block  switch 
is  turned  off  in  the  initialization  procedure  of  the  compiler  phase,  and 
is  turned  on  if  an  end  of  block  reserved  character  string  is  defined  and  is 
detected  in  the  input  batch  stream,  or  if  an  end  of  file  condition  occurs. 
If  SCAN  is  called  with  EOBSW  on,  then  the  end  of  block  token  is  passed  to 
the  parser  if  one  has  been  defined;  otherwise,  the  reserved  token  <NULL> 
is  passed  to  the  parser,  thus  causing  a  syntax  error  due  to  the  presence 
of  the  illegal  (to  the  parser)  token  <NULL> . 
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4.6  TABLE-DRIVEN  SEMANTICS  ROUTINES 

As  is  clear  from  the  previous  discussion  of  the  semantics  table 
generation,  all  semantics  actions  of  the  compiler  phase  of  the  emitted 
processor  are  initiated  by  the  application  of  syntactic  reductions  of 
the  source  language  in  the  course  of  parsing  it.  The  procedures  forming 
the  parser  were  mentioned  in  a  previous  section;  in  particular,  the 
procedure  COMPILATION  LOOP  is  the  heart  of  the  parser.  This  latter 
routine,  when  the  finite  state  it  represents  reaches  a  state  representing 
a  syntactic  production,  calls  the  main  semantic  procedure  SYNTHESIZE  (a 
name  retained  in  honour  of  XPL  [MK1])  with  one  parameter,  the  number  of 
the  production  being  applied.  It  is  SYNTHESIZE  that  determines  if  any 
semantics  have  been  specified  for  that  production  (by  accessing  the 
array  SEMANTICS)  and,  if  there  are  any,  then  the  list  of  them  is  processed 
element  by  element,  thus  performing  the  semantic  action  specified,  such 
as  linking  program  segments  or  emitting  new  instructions.  This  processing 
consists  simply  of  unpacking  the  two  pieces  of  information  to  determine 
the  type  of  semantics  and  its  qualifier,  which  may  be  a  pointer  to  the 
auxiliary  array  SYNTHES I ZE_SEGMENT  or  simply  a  further  and  sufficient 
classification  of  semantics  type.  As  the  semantics  language  is  enriched, 
either  by  the  group  supporting  this  system  for  the  benefit  of  all  users, 
or  by  a  particular  user  to  incorporate  some  special  facilities  for  use 
in  writing  his  semantics  specifications,  the  semantics  list  processor, 
which  is  extremely  straightforward,  need  only  be  modified  to  add  a  call 
to  the  new  semantics  procedure.  There  is  a  separate  semantics  procedure 
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for  each  type  specified  in  the  earlier  section  on  the  semantics  tables. 

To  treat  them  in  the  order  in  which  they  were  presented,  the  first  is 
the  procedure  for  sequential  semantics  ("S") . 

The  procedure  SYNTHES I ZE_SEQUENTIAL  is  simply  a  list  processor 
invoked  by  a  sequential  semantics  specification.  Its  function  is  to 
link  existing  instruction  lists  (program  segments)  or  to  add  single 
instructions  to  existing  lists.  It  is  passed  one  parameter  which  is  the 
index  into  the  array  SYNTHES I ZE_SEGMENT  used  to  obtain  the  list  of 
syntactic  elements  or  instructions  to  be  linked,  preceded  by  the  target 
syntactic  element.  Once  linkages  are  completed,  the  first  element  of 
the  resultant  program  segment  is  returned  to  the  element  of  the  parse 
stack  array  SEGMENT_STACK  specified  by  the  target  number.  All 
references  to  program  segments  associated  with  syntactic  tokens  use  the 
numbering  scheme  explained  in  the  discussion  of  sequential  semantics  in 
section  4.3.3.  Before  performing  any  of  the  specified  linkages,  all 
references  to  elements  in  the  parse  stack  must,  therefore,  be  relocated 
relative  to  the  position  of  the  left-side  token  of  the  production  being 
applied.1  The  position  of  this  token  in  the  stack  is  marked  by  the 
pointer  MP,  and  relocation  must  be  done  relative  to  MP-12.  To  synthesize 
new  segments,  the  procedure  uses  three  others:  EMIT_SEGMENT,  which  is 
a  valued  procedure  which  is  passed  an  instruction  op  code  and  returns  the 

1  Note  that  the  collapse  of  the  parse  stack,  signifying  the  application 

of  the  production,  occurs  upon  return  from  the  procedure 
SYNTHESIZE 

The  use  of  MP-1  is  necessitated  as  the  element  zero,  after  the  production 
is  applied  occupies  the  same  location  held  by  element  one  before  the 
production  was  applied.  To  be  consistent,  all  references  to  the  production 
token  zero  are  changed  to  one  without  the  knowledge  of  the  user. 
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first  element  of  the  program  segment  generated  for  it;  CAT  OP,  which  is  a 
valued  procedure  which  is  passed  the  address  of  the  first  element  of  an 
existing  segment  and  an  op  code,  links  the  latter  to  the  segment  defined  by 
the  former,  and  returns  the  former;  and  CAT_SEGMENT,  which  is  a  valued 
procedure  which  is  passed  the  addresses  of  the  first  elements  of  two  program 
segments,  links  the  latter  segment  to  the  former  and  returns  the  address  of 
the  former.  These  procedures  perform  error  checking  to  the  extent  of  detec¬ 
ting  invalid  operations  in  the  segments  they  are  linking,  as  well  as  verify¬ 
ing  the  existence  of  the  proper  terminating  op  code  ("RET")  at  the  end  of 
each  segment.  An  invalid  operation,  which  can  result  only  from  incorrect 
semantics  specifications,  is  replaced  by  a  termination  code  and  an  error 
message  is  produced.  These  procedures  also  place  a  termination  on  all 
resultant  segments,  if  necessary,  before  returning.  The  structure  of 
segments  and  their  linkages  and  invocations  is  discussed  as  part  of  the 
emulator  in  a  later  section  (see  also  Fig.  4.5). 

In  addition  to  these  user-directed  operations,  source  reference 
instructions  may  be  automatically  added  at  the  beginning  and  end  of  the 
final  resultant  segment  before  its  first  element  is  stored  in  the  parse 
stack  as  specified  by  the  target  element.  During  compilation  a  CARD_ 

COUNT  and  GENERATED  TOKEN  are  used  to  store  the  current  source  record 
number,  and  the  syntactic  token  last  produced.  Kept  with  these  are 
switches  set  whenever  they  are  updated,  and  if  these  switches  are  on, 
then  instructions  are  emitted  at  the  head  of  the  new  segment  to  set  the 
source  record  count  during  execution,  and  at  the  head  and  tail  of  the  new 
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segment  to  push  and  pop  the  token  being  generated  at  execution  time. 
These  source  references  permit  run-time  error  messages  to  specify 

"while  calculating/executing  a  . . .  beginning  on  line  . "  and,  by 

using  a  stack,  to  dump  these  syntactic  entities  waiting  to  be  completed. 
Further  details  on  this  operation  are  left  to  the  next  section  on  the 
emulator. 

Now  that  the  functioning  of  the  sequential  linkage  for  program 
segments  has  been  discussed,  it  is  worthwhile  mentioning  the  use  of  a 
target  specification  in  such  semantics. 

Clearly,  the  resultant  of  any  semantics  involving  segments 
must  be  returned  to  the  parse  stack  which  maintains  pointers  to  the 
program  segments,  namely  the  array  SEGMENT  STACK.  Further,  as  the 
application  of  the  production  collapses  the  stack,  the  last  segment 
semantics  must  update  the  first  element  of  the  production  group, 
referenced  in  the  stack  by  the  pointer  MP.  However,  the  specification 
of  semantics  may  require  successive  applications  of  different  semantics, 
with  the  intermediate  results  being  stored  temporarily  between  semantic 
operations.  To  permit  this,  any  element  of  the  production  group  may 
have  its  associated  program  segment  pointer  replaced  by  the  resultant 
of  a  segment-type  semantic  operation.  This  is  done  by  specifying  its 
reference  number  in  the  production  as  the  target  in  the  input  record 
(i.e.  the  object  of  the  "AT").  Obviously,  the  user  must  ensure  that 
he  replaces  only  those  segment  references  with  which  he  is  finished. 
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The  second  type  of  segment  semantics,  the  alternate  type, 
generates  basically  a  one  instruction  program  segment,  which  may  be 
preceded  by  an  automatically  generated  instruction  to  update  the  source 
record  count  and/or  bracketed  by  automatically  generated  push  and  pop 
instructions  for  the  execution-time  parse  trace  (as  was  detailed  above) . 
It,  too,  works  on  a  list  in  SYNTHESIZE  SEGMENT,  obtained  as  above, 
which  yields  *the  target  element  and  a  list  of  production  element  pointers 
and/or  single  instruction  op  codes.  In  this  case,  all  single  op  codes 
are  used  to  generate  one  instruction  segments,  and  the  addresses  of  the 
first  elements  of  these  and  the  addresses  of  the  first  elements  of  all 
segments  obtained  from  the  parse  stack  as  specified  are  linked  in  a 
sequential  list  in  the  array  SEGMENT,  preceded  by  the  count  of  the  total 
number  of  actual  elements  in  the  list.  This  count  is  used  for  bounds - 
checking  when  a  selective  execution  is  performed  at  run  time.  The  addres 
of  the  count,  which  is  the  first  element  of  the  list,  is  stored  in  the 
one  instruction  then  emitted  as  a  new  program  segment  (as  above) ,  and  the 
address  of  this  latter  segment  is  returned  to  the  element  of  the  parse 
stack  array  SEGMENTJSTACK  specified  by  the  target.  The  alternate  instruc 
tion  may  then  be  linked  to  other  program  segments  by  further  semantics 
instructions . 

The  link  semantics  is  the  next  to  be  detected  by  SYNTHESIZE. 

The  procedure  SYNTHESIZE  LINK  is  called  with  one  parameter,  this  again 
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being  a  pointer  to  the  auxiliary  semantics  array  SYNTHESIZE  SEGMENT. 

From  this  latter  array,  the  target  reference  and  the  list  of  segment 
pointers  and/or  op  codes  is  obtained.  The  target  element  is  verified 
to  be  a  pointer  to  an  alternate- type  instruction,  whose  selection  list 
is  then  found.  This  list  in  the  program  array  SEGMENT  is  then  chained 
in  order  to  the  segments  referenced  from  the  list  in  the  semantics 
array,  with  op  codes  being  expanded  to  form  independent  segments  that 
can  then  be  added  to  the  chain.  If  the  target  reference  is  found  not 
to  be  an  al ternate-type  instruction,  then  an  error  message  is  given 
and  no  semantic  action  is  taken. 

The  next  group  to  be  handled  are  the  table  and  stack  semantics. 
These  are  executed  by  calls  to  the  routines  SYNTHESIZE  TABLE  and  SYNTHESIZE 
STACK,  respectively.  The  parameter  passed  to  either  of  these  procedures 
is  the  index  of  a  list  in  the  auxiliary  array  SYNTHESIZE_SEGMENT,  the 
first  element  of  which  identifies  the  particular  type  of  table  or  stack 
semantics  involved  and  is  followed,  where  applicable,  by  the  list 
of  parse  stack  or  auxiliary  parse  stack  elements  to  be  used,  as  well  as 
pointers  to  any  constants  to  be  used  (such  as  values  to  be  assigned  as 
a  symbol’s  attribute).  Code  emission,  if  necessary,  is  achieved  via 
the  same  segment  procedures  used  by  the  SYNTHES I ZE_SEQUENTIAL  procedure 
and  others  described  above.  These  semantics  types  are  the  least  forma¬ 
lized  of  the  semantics  specifications  at  the  time  of  the  writing  of  this 
thesis;  the  particular  ad  hoc  techniques  used  are  given  in  the  appendices. 
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However,  the  table  and  stack  operations  are  under  consideration  by  the 
author  and  other  project  members,  and  should  be  formalized  in  the  near 
future . 

The  last  semantics  type,  the  routine  type,  is  the  simplest. 

The  SYNTHESIZE  procedure  removes  the  high-order  part  of  the  number  taken 
from  the  array  SEGMENT  which  specifies  that  this  was,  in  fact,  an  "R"  type 
semantics  specification,  and  passes  the  remaining  low-order  part,  which 
specifies  the  routine  to  be  invoked,  as  a  parameter  to  the  procedure 
SYNTHESIZE_ROUTINE .  Unless  this  routine  is  modified  by  the  user,  an 
error  message  is  then  produced,  stating  that  the  routine  (whose  name  is 
obtained  from  the  library  specified  by  the  user;  see  Semantics  Table 
Generation)  has  been  invoked,  but  does  not  exist.  Compilation  then 
continues . 

At  the  completion  of  each  type  of  semantics,  the  next  element 
in  the  array  SEMANTICS  is  accessed.  If  it  is  a  terminator,  then  a 
return  is  made  from  SYNTHESIZE;  otherwise,  the  semantic  action  for  it 
is  performed.  This  continues  for  each  production  until  all  the  semantic 
specifications  have  been  handled. 

At  the  end  of  compilation,  provided  the  parse  has  been 
successful,  the  program  segment  associated  with  the  only  element  left 
on  the  parse  stack,  namely  the  goal  symbol,  is  the  entire  program.  The 
value  of  SEGMENT  STACK  (0)  is  therefore  passed  to  the  emulator  as  the 
point  at  which  to  begin  execution. 
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Figure  4.4 

Symbol  Table  Structure  and  Memory  Access 


LINK  STRUCTURE  OF  HEADNODES  : 
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This  permits  linkage  to  global  symbol  tables  at  either 
compile-time  or  run-time  in  the  emitted  processor. 


FORMAT  OF  THE  SYMBOL  TABLE  TREE  NODES  : 
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Figure  4.5 

Object  Program  Format  in  Pseudo-Machine  Memory 


The  concept  of  program  segments  is  introduced  in  section  3.8. 

Section  4.3.3  details  the  means  of  specifying  compiler  actions 
relating  to  segments,  section  4.6  describes  the  actual  operations 
of  the  compiler  in  producing  these  program  segments,  and  section  4.7 
describes  the  execution  of  them.  The  three  examples  here  illustrate 
the  basic  structures  that  appear  in  the  object  programs  of  the 
pseudo-machine . 


A  program  segment  : 
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Two  segments  linked  to  form  one  : 
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4 . 7  EMULATOR 


The  discussion  of  the  pseudo-machine  used  in  this  language 
processor  will  begin  with  the  stored  program  concept  it  uses.  As  was 
mentioned  in  Chapter  1,  and  was  implied  throughout  the  previous  section, 
programs  are  compiled  directly  into  the  BIT (16)  array  SEGMENT,  which 
serves  as  the  pseudo-machine's  program  memory.  Program  segments  exist  in 
this  array,  consisting  of  sequential  instructions  and  branches  within  the 
array  where  linkages  were  made  by  the  compiler.  Selector  lists  of  segment 
addresses  for  selective  executions  are  also  maintained  in  this  array.  In 
addition  to  this  linear  memory,  a  control  stack  and  an  accumulator  stack 
are  used  by  the  pseudo-machine,  as  well  as  the  symbol  table,  retained  from 
the  compilation  phase.  The  use  of  the  two  stacks  is  explained  below,  while 
the  architecture  of  the  pseudo-machine  is  summarized  in  the  appendices. 

The  execution  control  for  the  machine  is  based  on  the 
execution  of  program  segments  to  their  completion.  Execution  of  a 
compiled  program  is  initiated  by  clearing  the  array  SEGMENT_STACK  used 
in  compilation  and  then  calling  the  procedure  EMULATION_LOOP .  This 
procedure  (which  is  illustrated  in  Fig.  4.6)  pushes  the  address  of  the 
first  instruction  in  the  program  (as  passed  from  the  compiler)  onto 
SEGMENT_STACK,  and  then  calls  the  procedure  EXECUTE_SEGMENT  with  this 
address  as  the  only  argument.  This  procedure  executes  instructions 
sequentially,  starting  with  the  one  at  the  address  of  the  calling 
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argument,  with  the  exception  of  in-segment  branches  that  are  merely 
links  to  other  parts  of  the  program  list.  Most  operations  have  no 
effect  on  the  program  execution  sequence,  and  these  will  be  discussed 
below.  For  now,  however,  only  those  few  affecting  the  program  control 
are  considered. 

If  the  segment  is  executed  to  completion,  the  "RET",  or 
return,  instruction  is  executed  last,  a  pointer  called  SEGMENT  RESTART 
is  zeroed,  and  a  return  is  issued  from  EXECUTE  SEGMENT.  On  return  to 
EMULATION_LOOP,  the  pointer  is  checked  and,  if  it  is  zero,  the  top 
element  of  SEGMENT_STACK  is  popped.  If  the  stack  is  thereby  depleted, 
execution  is  terminated.  However,  if  the  segment  is  not  completed, 
but  instead  another  segment  is  invoked  (as  in  a  CALL) ,  then  SEGMENT_ 
RESTART  is  set  to  the  next  instruction  (the  return  point  after  the 
call)  and  a  pointer  called  SEGMENT_CALLED  is  set  to  the  address  of  the 
first  instruction  of  the  segment  to  be  executed.  EXECUTE_SEGMENT 
then  returns  to  EMULATION  LOOP  which,  seeing  SEGMENT  RESTART  to  be  non¬ 
zero,  replaces  the  top  element  of  SEGMENT  STACK  by  SEGMENT_RESTART, 
and  then  pushes  SEGMENT  CALLED,  if  non-zero,  on  to  the  stack.  The 
action  is  then  the  same  as  for  a  normal  return.  The  stack  is  not 
depleted,  however,  so  instead  of  terminating  execution,  EXECUTE_SEGMENT 
is  called  with  the  top  element  of  SEGMENT_STACK  as  its  argument.  Thus, 
when  an  invoked  segment  is  terminated,  the  stack  is  popped  and  execution 
of  the  calling  segment  is  resumed  at  the  instruction  following  the  call. 
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The  last  special  case  is  that  for  a  direct  transfer  (as  in 
a  GOTO).  In  this  case,  the  address  of  the  segment  instruction  to 
be  branched  to  is  obtained  from  the  symbol  table  where  it  was  loaded 
during  compilation  and  SEGMENT_RE START  is  set  equal  to  it.  SEGMENT 
CALLED  is  zeroed  and  a  return  to  EMULATION  LOOP  made.  This  results  in 
SEGMENT_RESTART  replacing  the  top  element  of  SEGMENT  STACK,  but  SEGMENT 
CALLED,  being  zero,  is  not  pushed  onto  the  stack.  The  transfer  is  thus 
effected. 

An  extra  consideration  is  the  problem  of  branching  into  an  outer 
scope,  when  a  return  must  be  effected  for  each  inner  scope  being  exited. 

The  updating  of  the  lexic  level  counter  and  the  popping  of  the  corresponding 
number  of  restart  segment  addresses  from  SEGMENT  STACK  is  performed  by  the 
instruction  used  for  such  direct  transfers  within  the  EXECUTE_SEGMENT 
procedure,  thus  simplifying  the  control  mechanism  in  EMULATION_LOOP . 

The  routine  EXECUTE_SEGMENT  is  primarily  concerned  with  the 
non-branching,  data  manipulation  instructions.  It  performs  all  data 
operations  using  one  accumulator  stack,  which  actually  consists  of  three 
parallel  arrays:  one  of  8  bit  elements  to  indicate  the  type  of  data  stored, 
one  of  32  bit  elements  for  integer  values  and  descriptions,  and  one  of  string 
descriptors  for  the  storage  of  character  strings.  The  use  of  these 
separate  stacks  was  dictated  by  the  use  of  XPL  as  the  initial  base 
language.  All  arithmetic,  string,  and  logical  operators  are  zero- 
operand  instructions  occupying  one  element  of  SEGMENT,  with  the  operandCs) 
assumed  to  be  the  top  (or  top  two)  elements  of  the  accumulator  stack. 

Prior  to  the  execution  of  the  operation  itself,  the  types  of  the  operands 
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Figure  4.6 

Execution  Control  Stack 


To  illustrate  the  operation  of  the  execution  control  stack,  suppose 
that  the  following  program  is  in  the  memory  of  the  pseudo-machine 
(i.e.  in  the  array  SEGMENT),  and  that  location  1  contains  the  first 
instruction  of  the  program. 
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At  the  start  of  execution,  the  control  stack  contains  (1) . 

After  the  instruction  in  location  1  is  executed,  the  stack  contains  (7 ,2) . 


(The  left-most  in  the  list  represents  the  top-most  on  the  stack) . 


After  the  instruction  in  location  7  is  executed,  the  stack  contains  (9,8,2). 
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After  the  instruction  in  location  3  is  executed,  the  stack  is  depleted 
and  the  program  execution  ends. 
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are  checked  by  a  special  procedure  TYPE_CONVERSION  which  is  called 
with  the  accumulator  stack  index  and  required  type  as  operands.  If 
conversion  is  required,  this  routine  does  it  and  then  returns;  otherwise 
it  returns  instantly.  If  a  user  defines  his  own  data  types,  their 
conversion  to  other  types  and  vice  versa  can  be  defined  here.  In 
operation,  the  machine  functions  much  like  the  Burrough's  B5500  [BUI], 
so  that  the  operands  of  an  instruction  are  popped  and  replaced  by  the 
resultant.  Other  special  operands  occupying  one  SEGMENT  element  exist 
to  set  internal  counters,  such  as  the  source  record  counter,  and  to 
perform  symbol  table  accesses,  such  as  the  load  address  instruction; 
these  instructions  have  packed  with  them  one  operand.  Finally,  some 
instructions  Cn°tably  the  push  generated  token  command)  are  two  segment 
elements  in  length.  The  actual  operation  of  each  instruction  is  not 
detailed  here;  the  program  listings  themselves  may  be  consulted  and  a 
programmer's  reference  manual  for  the  system  is  in  preparation. 

Error  handling  was  discussed  at  the  end  of  Chapter  3,  and  the 
guidelines  set  down  there  are  followed  in  the  emulator.  Errors  are 
categorized  as  simple,  severe,  fatal,  and  a  special  class  caused  by 
system  deficiencies,  such  as  too  small  an  accumulator  stack.  All  errors 
occurring  at  run  time  cause  an  error  message  describing  the  problem  and 
its  severity  to  be  printed,  followed  by  a  dump  of  the  present  execution 
point  relative  to  the  source  program,  a  trace  of  the  past  several 
instructions,  a  trace  of  the  parse  items  pending  completion  and  the 
token  currently  being  calculated  or  executed.  Variables  involved  in 
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the  operation  are  dumped  together  with  their  history  (where  last  acquired 
a  value,  etc.)*  Finally,  if  the  error  is  not  fatal,  recovery  is  attempted, 
for  example,  by  assuming  a  result  if  an  illegal  conversion  or  operation 
is  attempted,  or  supplying  default  values  for  a  variable  used  before 
acquiring  a  value.  Also,  if  a  job  is  to  be  terminated  for  running  over¬ 
time  or  exceeding  an  output  limit,  then  a  trace  is  invoked  for  a  short 
time  before  terminating  the  job  so  that  the  programmer  can  see  where  the 
program  was  executing  when  it  was  cancelled.  This  diagnostic  capability 
may  seem  to  be  too  elaborate,  but  it  is  just  such  a  facility  which  can 
make  a  processor  for  student  languages  most  effective,  and  more  than 
justify  the  additional  overhead  required  by  an  emulator. 

4.8  PROCESSOR  GENERATION 


The  final  phase  of  this  system  performs  the  actual  merging  of 
the  parsing  tables,  the  scanning  tables  and  the  semantics  tables  generated 
in  the  first  three  phases  described  previously  with  the  proto-processor 
containing  the  table-driven  parser,  the  table-driven  scanner,  the  table- 
driven  semantics  routines,  the  emulator  and  other  miscellaneous  procedures. 
Included  in  this  latter  group  are  time  and  date  routines,  string  matching 
and  searching  routines,  dump  and  trace  routines,  separate  error  routines 
for  errors  detected  by  the  compiler  and  by  the  emulator,  and  a  supervisory 
set  of  procedures  for  batch  job  processing.  Although  the  emitted  source 

program1  is  designed  to  be  a  complete  language  processor,  the  XPL  philosophy 

^the  emitted  program  is  in  XPL  at  the  time  of  the  writing  of  this  thesis. 

It  shall  be  converted  to  BPL  (base  programming  language)  once  this  language 
has  been  implemented  and  adopted  by  the  SLAP  project  group. 
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of  structuring  the  program  as  a  framework  to  facilitate  user  modifications 
was  followed.  Such  an  organization,  stressing  the  sub-division  of  tasks 
into  several  separate  procedures,  increases  ease  of  modification  and 
also  permits  independent  and  consequently  simpler  debugging  of  them, 
thus  improving  the  overall  usefulness  of  the  system. 
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CHAPTER  5 


SUMMARY 


Although  the  implementation  of  the  entire  system  described  in 
the  preceding  chapters  is  not  complete  at  the  time  of  writing  of  this 
thesis,  the  design  of  the  system  (with  the  few  exceptions  noted)  is 
complete1.  Further,  the  implementation  has  been  carried  to  a  point 
where  the  feasibility  of  the  system  in  producing  practical  processors 
is  proven;  only  the  time-consuming  task  of  encoding  and  debugging  those 
features  designed  and  not  yet  implemented  remains. 

The  parser  generator  implemented  by  W.R.  Lalonde  [LAI]  is  fully 
operational  and  permits  languages  of  complexity  LALR(l)  to  be  recognized. 

The  original  schemes  for  specifying  a  scanning  mechanism  and 
semantics  for  the  pseudo-machine  have  been  implemented.  The  scanner 
generated  by  this  scheme  has  been  found  to  be  approximately  twenty-five  per 
cent  faster  in  a  few  sample  programs  than  the  character-by- character 
scanner  used  in  XPL  [MK1] .  The  capability  of  the  semantics  language  and 
the  emulator  should  be  sufficient  for  most  student  or  commercial  languages, 
with  the  exception  of  a  good  I/O  facility. 

The  language  XPL,  a  rich  subset  of  PL/I,  has  been  specified 
syntactically  and  semantically  in  the  meta- languages  described.  This 

1  APPENDIX  III  details  the  status  at  the  time  of  writing. 
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includes  procedure  and  block  structures,  expressions,  various  DO  types, 
conditionals  and  declaration  statements. 

The  extensibility  of  the  pseudo-machine  assures  the  sufficiency 
of  the  machine  to  handle  any  computations  within  the  scope  of  the  hardware, 
as  the  base  language  in  which  supplementary  routines  are  written  permits 
the  programmer  to  specify  actual  machine  code  to  be  emitted  as  part  of  the 
program  forming  the  emulator.  Such  a  technique  violates  the  initial  goal 
of  specifying  semantics  simply  and  at  a  high  level,  but  its  availability 
ensures  the  pseudo-machine's  capability. 

The  efficiency  of  the  semantics  routines  emitting  code  has  not 
yet  been  evaluated  relative  to  other  works,  as  there  is  no  suitable  bench¬ 
mark.  However,  their  simplicity  of  operation  and  the  small  amount  of 
emitted  code  (permitted  by  the  structure  of  the  pseudo-machine)  should 
yield  very  fast  compilation  rates;  sample  programs  are  now  being  compiled 
at  a  rate  bounded  by  disk  access  time. 

The  concept  of  program  segments  has  been  known  for  some  time  as 
a  means  of  rapidly  processing  zero-operand  instructions;  however,  the 
linkage  mechanisms  in  the  semantics  language  and  the  use  of  selector  lists 
is  original.  The  design  of  the  emulator  was  influenced  by  the  accumulator 
stack  and  the  principles  of  operation  of  the  Burroughs  B5500  [BUI],  and 
by  previous  work  done  on  object-time  control  and  pseudo-machines  [PU1, 

RA1,  W01] .  In  particular,  the  decision  to  use  a  zero-address,  stack 
machine  for  computations,  as  opposed  to  a  two  or  three  operand,  storage- 
to-storage  machine,  was  made  because  of  such  a  machine's  suitability  to 
the  semantic  actions  associated  with  the  syntax-directed  compilation 
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procedure,  as  illustrated  by  the  success  with  just  such  a  scheme  in  the 
SPL  pseudo-machine  [W01]  and  the  problems  encountered  with  the  multiple- 
address  machine  of  PLUTO  [PU1]. 

The  diagnostics  produced  by  the  error  handling  routine  of  the 
emulator  are,  in  large  measure,  adopted  from  the  work  of  J.M.  Pul  lam  [PU1] 
although  they  have,  of  necessity,  been  expanded  to  relate  to  the  syntax 
of  the  source  language  as  it  was  specified  to  the  processor  generator, 
rather  than  just  to  the  syntax  of  PL/I. 

In  addition,  all  of  the  programs  contained  in  the  system,  with 
the  exception  of  the  parsing  tables  generator  and  the  parsing  algorithm 
procedures  of  W.R.  Lalonde  [LAI],  are  the  work  of  the  author. 

In  conclusion,  the  system  whose  development  is  described  in 
this  thesis  is  practical  and  useful.  The  processor  generator  permits 
the  high-level  specification  of  the  syntax  and  semantics  of  languages 
at  the  complexity  of  Algol.  The  diagnostics  produced  by  this  phase  are 
sufficient  for  language  development  purposes,  and  the  object  code 
produced  for  the  emitted  processor  (written  in  the  base  language  XPL-plus) 
is  efficient  enough  to  make  the  processor  practical  for  student  job 
operation. 

The  processor  itself  has  an  excellent  diagnostic  capability, 
although  many  error  messages  now  produced  will  have  to  be  oriented  to 
student  users.  It  has  sufficient  compilation  speed  to  make  it  practical 
for  the  batch  processing  of  student  jobs,  and  yet  is  not  prohibitive  in 
space  overhead1. 

1  The  proto-processor  with  table  framework  now  occupies  approximately 
50K  bytes  and  will,  when  finished,  occupy  about  80K  bytes. 
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The  system  has  demonstrated  the  capability  of  a  processor 
generator  system  to  produce  a  family  of  compatible  student  language 
processors,  and  it  forms  the  basis  for  the  development  of  an  efficient 
language  processor  for  educational  or  commercial  use. 
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APPENDIX  I 

SYNTAX  OF  PROCESSOR  SPECIFICATIONS 

SECTION  I:  SYNTAX  SPECIFICATIONS 

The  following  is  an  abridged  version  of  the  input  to  the 
parse  table  generator.  For  a  complete  specification,  the  reader  should 
consult  [LAI] . 

The  first  input  record  should  suffice  for  most  users;  refer 
to  [LAI]  for  more  details. 

OPTIONS  (<input>,N0FSMK,LALR,EXTRAT) 
where  <input>  : :=  FINPUT  [  AINPUT 

The  selection  of  <input>  specifies  the  format  used  for  the  language's 
syntax  specification  which  must  follow. 

For  FINPUT: 

1.  all  blanks  are  ignored; 

2.  all  tokens  are  delimited  by  ; 

3.  a  token  consists  of  at  least  one  non-blank  character; 

4.  a)  :  =>  the  preceding  was  a  production  head; 

b)  ;  =>  the  start  of  a  new  production  with  the  same 
production  head  as  the  previous  production 
(preceding  token  is  the  tail  of  preceding  pro¬ 
duction)  ; 
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c)  ,  =>  preceding  token  is  followed  by  another  token; 

d)  .  =>  preceding  token  is  the  tail  of  preceding  production. 

For  AINPUT: 

1.  only  one  production  per  card; 

2.  the  production  goal  symbol  must  start  in  column  1; 

3.  each  symbol  is  delimited  by  a  blank  character;  all  characters 
between  angle  brackets  "<"  and  ">"  are  picked  up  Cn0  sloughing); 

4.  a  new  production  with  the  same  production  goal  as  the  last 
production  need  not  have  its  production  goal  repeated  if  it 
starts  beyond  column  1 . 

SECTION  II:  SCANNER  SPECIFICATIONS 


The  syntax  and  input  format  of  the  scanner  specifications  is 
described  in  more  detail  in  section  4.2.  For  this  syntax,  the  normally 
used  "<>"  in  BNF  is  replaced  by  "(}M. 
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{scanner  specification)  ::=  (non-terminal  definition)  |  (string  definition) 
(non-terminal  definition)  ::=  (non-terminal )  (term  list) 

(term  list)  : :=  (term)  |  (term  list)  §  (term) 

(term)  : :=  (repetition  operator)  (factor  list) 

(repetition  operator)  : :=  (number)  TO  (number)  |  EXACTLY  (number) 

AT  MOST  (number)  |  AT  LEAST  (number) 

(factor  list)  ::=  (factor)  |  (factor  list)  /  (factor) 

(factor)  : :=  (string)  |  •- ,  (string) 

(non- terminal )  :  :=  <  (non- terminal  name.)  > 

(string  definition)  : :=  (string)  (match  type)  (character  string) 

(translation  string) 

(match  type)  ::=  ALL  |  ANY 

(translation  string)  : :=  (null)  |  =>  @  (actual  string)  @ 

(character  string)  : :=  @  (actual  string)  @ 

The  input  format  allows  one  specification  per  logical  record.  The 
beginning  of  a  logical  record  is  signified  by  a  non-blank  in  the  first 
column  of  a  physical  record.  Physical  records  with  a  blank  in  the 
first  column  are  catenated  to  the  current  logical  record,  but  preceded 
by  a  The  M§"  may  not  be  coded  explicitly  but  must  be  inserted 

implicitly,  as  above. 
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SECTION  III:  SYMBOL  TABLE  SPECIFICATIONS 

The  following  specification  is  explained  in  section  4.3.1. 
The  input  format  allows  one  attribute  definition  per  physical  record. 

<attribute  definition  : :=  <attribute  name>  <type> 

<type>  ::=  FIXED  <fixed  initial>  |  CHARACTER  <character  initial> 
<fixed  initial>  : :=  <null>  |  INITIALLY  <integer> 

<character  initial>  : :=  <null>  |  INITIALLY  <string> 


SECTION  IV:  USER-SUPPLIED  ROUTINE  SPECIFICATIONS 


These  are  described  in  section  4.3.2.  The  input  format  is 
the  name  of  one  user-supplied  semantics  procedure  per  physical  record. 


SECTION  V:  SPECIFICATION  OF  SEMANTICS  ASSOCIATED  WITH  SYNTAX 

These  specifications  are  described  in  detail  in  section  4.3.3. 
The  input  format  permits  one  semantics  specification  per  physical  record. 
Only  those  types  now  operational  are  described;  for  others  see  APPENDIX 


II. 
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<semantics  specification  : :=  production  semantics>  |  <sequential  semantics> 

<alternate  semantics>  |  <link  semantics>  j 
<routine  semantics> 

<production  semantics>  : :=  <p-head>  <production  number> 

<sequential  semantics>  : :=  <s-head>  <target>  <link>  <list> 

<alternate  semantics>  ::=  <a-head>  <target>  <link>  <list> 

<link  semantics>  ::=  <l-head>  <l-target>  <add>  <list> 

<routine  semantics>  : :=  <r-head>  <routine  name> 


<p-head>  : :=  P 

PRODUCTION 

<s-head>  : :=  S 

SEQUENTIALLY 

<a-head>  : :=  A 

ALTERNATELY 

<l-head>  : :=  L 

LINK 

<r-head>  : :=  R 

ROUTINE 

<target>  : :=  <null>  |  AT  <token> 

<l-target>  : :=  <null>  |  TO  <token> 

<link>  ::=  <null>  |  LINK 

<list>  : :=  <list  element>  |  <list>  ,  <list  element> 
<list  element>  : :=  <token>  |  <op> 


<add>  : :=  <null> 


ADD 
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APPENDIX  II 


ARCHITECTURE  OF  THE  PSEUDO-MACHINE 


MEMORY : 

The  program  memory  consists  of  16  bit  words  C/360  halfwords) , 
with  32  bit  integer  and  up  to  256  byte  string  data  memory.  The  size  of 
all  memory  areas  is  alterable  by  simply  changing  a  DECLARE  statement  in 
the  proto-processor  source  program. 


STORED  PROGRAM  CONCEPTS: 

Instructions  are  one  or  two  words  in  length.  The  length  is 
determined  by  the  first  word,  which  is  the  operation  code.  No  explicit 
"next  instruction"  address  Cas  in  the  IBM  650)  is  used;  instructions  are 
executed  sequentially  unless  an  explicit  transfer  instruction  is  executed. 


DATA  PROCESSING: 

Data  manipulation  is  done  using  an  accumulator  stack,  as  in  the 
Burroughs  B5500  hardware.  All  instructions  operating  on  this  stack  have 
no  explicit  operands  and  are  consequently  only  one  word  in  length;  the 
operands  are  implicitly  the  data  at  the  top  of  the  accumulator  stack,  the 
next  below  it,  etc.,  the  number  of  operands  being  dependent  on  the 
instruction.  Some  instructions,  such  as  those  that  calculate  array  offsets 
from  multiple  subscripts  all  on  the  stack,  have  an  indeterminate  number 
of  operands . 
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The  accumulator  stack  is  actually  three  stacks,  one  of  8  bit 
elements  indicating  the  type  of  data,  and  two  for  the  data  proper:  one 
of  32  bit  elements  for  integers  and  symbol  table  pointers,  and  one  of 
XPL  string  descriptors  for  character  strings.  The  use  of  such  a  scheme 
was  dictated  by  XPL  implementation  restrictions. 
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APPENDIX  III 


STATE  OF  THE  SYSTEM  AT  TIME  OF  WRITING 


1.  the  parser  tables  generator  and  the  parsing  algorithm  are  operational; 

2.  the  scanner  tables  generator  and  the  scanning  algorithm  are  operational 

3.  the  symbol  table  structure  specifications  and  the  superstructure  are 
operational; 

4.  the  semantic  specifications  of  types  P,  S,  A  and  L  are  operational; 

5.  the  semantic  specification  of  type  G  is  not  used; 

6.  the  semantic  specification  of  type  T  is  coded  (not  operational)  only 

with  the  following  capability: 

T  SET  attribute-name  TO  value 
T  CHANGE  attribute-name  FROM  value  TO  value 
T  CHANGE  attribute-name  FROM  value  TO  value  ELSE  message 
This  permits  symbol  table  entries  referenced  in  the  auxiliary  stack 
to  have  attributes  unconditionally  set  or  set  provided  they  were 
previously  some  value;  otherwise,  an  optional  error  message  is  emitted; 

7.  the  semantic  specification  of  type  D  is  operational  to  the  extent  of 
permitting: 

D  PUSH  <token  index>  ,  <token  index>  ,  . 

D  POP 

This  permits  stacking  and  clearing  of  the  auxiliary  parse  stack  to 
hold  recursively  defined  tokens  referencing  the  symbol  table  which 
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must  be  assigned  attributes  after  they  themselves  have  been  reduced 
by  the  parser; 

8.  the  compiler  phase  of  the  processor  is  operational,  with  the 
exception  of  the  routines  handling  T  and  D  semantic  specifications, 
which  are  not  yet  fully  debugged; 

9.  the  emulator  is  partially  operational;  segments  may  be  executed,  and 
the  control  mechanism  for  stacking  returns  from  calls  exists;  the 
allocation  mechanism  for  memory  in  the  pseudo-machine  is  only 
partially  coded;  routines  for  data  operations  on  the  accumulator  stack 
are  not  complete; 

10.  batch  input  stream  control  is  operational,  as  are  various  timing  and 
statistical  routines; 

11.  error  routines  are  operational.  Parse  traces  by  the  emulator,  as 

well  as  source  program  pointers  and  checkpoint  features,  are  operational. 
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APPENDIX  IV 


AN  EXAMPLE 


To  demonstrate  the  generation  of  a  language  processor,  a  very 
simple  subset  of  a  pedagogical  language  [TOY  V)  devised  by  J.J.  Horning 
has  been  selected  and  its  syntax  and  semantics  coded. 

The  listings  produced  by  the  generator  are  presented  in  chrono¬ 
logical  order.  They  are  in  four  groups,  one  for  each  phase  of  the 
process:  parse  tables  generation;  scan  tables  generation;  semantics 
tables  generation;  and  the  compilation  and  execution  of  the  source 
program  obtained  by  merging  these  tables  with  the  proto-processor. 

The  parse  tables  generation  follows  immediately. 


NOFSMO  AS  OPPOSED  TO  FSMO 

BNF  AS  OPPOSED  TO  AL30L63 

NOFSMK  AS  OPPOSFD  TO  FSMK 

TABLES  AS  OPPOSED  TO  NOTABLES 

LALR  AS  OPPOSFD  TO  SLR 
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SCAN  TABLES  GENERATION 


INPUT  TRACE  :  FEBRUARY  lit  1971 
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TOKENS  RECOGNIZED  BY  SCANNER'S  TABLES  :  FEBRUARY  11,  1971  0'-» :  1 8: 57 . 03 
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GENERATED  PROCESSOR  -  COMPILATION  AND  EXECUTION 


TODAY  IS  FEBRUARY  11,  1971.  CLOCK  TIME  =  9:4:54.20 
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